86043838 datastage interview
TRANSCRIPT
![Page 1: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/1.jpg)
What is Data warehouse ?
What is Operational Databases ?
Data Extraction ?
Data Aggregation ?
Data Transformation ?
DataStage Designer
![Page 2: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/2.jpg)
Advantages of Data warehouse ?
DataStage ?
Client Component ?
Server Component ?
DataStage Jobs ?
![Page 3: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/3.jpg)
DataStage NLS ?
![Page 4: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/4.jpg)
Stages
Passive Stage ?
Active Stage ?
Server Job Stages
![Page 5: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/5.jpg)
![Page 6: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/6.jpg)
Parallel Job Stage
![Page 7: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/7.jpg)
![Page 8: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/8.jpg)
![Page 9: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/9.jpg)
Links ?
Parallel Processing
![Page 10: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/10.jpg)
Types of Parallelism
Plug in Stage?
![Page 11: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/11.jpg)
Difference Between Lookup and Join:
What is Staging Variable?
What are Routines?
what are the Job parameters?
What are Stage Variables,
Derivations and Constants?
why fact table is in normal form?
What are an Entity, Attribute and
Relationship?
What is Metastage?
![Page 12: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/12.jpg)
How many places u can call Routines?
What about System variables?
![Page 13: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/13.jpg)
What are all the third party tools
used in DataStage?
What is the difference between
change capture and change apply
stages
DataStage Engine Commands
What is the difference between
Transform and Routine in
DataStage?
Where can you output data using
the peek stage?
What is complex stage? In which
situation we are using this one?
![Page 14: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/14.jpg)
What is Ad-hoc query?
What is Version Control?
How Version Control Works?
Benefits of Using Version Control
![Page 15: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/15.jpg)
Lookup types in Datastage 8
![Page 16: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/16.jpg)
A data warehouse is a central integrated database containing data from all
the operational sources and archive systems in an organization. It contains
a copy of transaction data specifically structured for query analysis.
This database can be accessed by all users, ensuring that each group in an organization
is accessing valuable, stable data.
Operational databases are usually accessed by many concurrent users. The
data in the database changes quickly and often. It is very difficult to obtain
an accurate picture of the contents of the database at any one time.
Because operational databases are task oriented, for example, stock inventory
systems, they are likely to contain “dirty” data. The high throughput
of data into operational databases makes it difficult to trap mistakes or
incomplete entries. However, you can cleanse data before loading it into a
data warehouse, ensuring that you store only “good” complete records.
Data extraction is the process used to obtain data from operational sources, archives, and
external data sources.
The summed (aggregated) total is stored in the data warehouse. Because
the number of records stored in the data warehouse is greatly reduced, it
is easier for the end user to browse and analyze the data.
Transformation is the process that converts data to a required definition and value.
Data is transformed using routines based on a transformation rule, for
example, product codes can be mapped to a common format using a transformation
rule that applies only to product codes.
After data has been transformed it can be loaded into the data warehouse
in a recognized and required format.
DataStage Designer
![Page 17: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/17.jpg)
• Capitalizes on the potential value of the organization’s information
• Improves the quality and accessibility of data
• Combines valuable archive data with the latest data in operational sources
• Increases the amount of information available to users
• Reduces the requirement of users to access operational data
• Reduces the strain on IT departments, as they can produce one database to serve all user groups
• Allows new reports and studies to be introduced without disrupting operational systems
• Promotes users to be self sufficient
the design and processing required to build a data warehouse. It is ETL
• Extracts data from any number or type of database.
• Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data.
You can easily extend the functionality by defining your own transforms to use.
• Loads the data warehouse.
It Consist of number of Client Component ans Server Component
DataStage server and parallel jobs are compiled and run on the DataStage server. The job will connect to databases on other machines as necessary,
extract data, process it, then write the data to the target data warehouse.
DataStage mainframe jobs are compiled and run on a mainframe. Data extracted by such jobs is then loaded into the data warehouse.
DataStage Designer -> A design interface used to create DataStageapplications (known as jobs).
DataStage Director-> A user interface used to validate, schedule,run, and monitor DataStage server jobs and parallel jobs.
DataStage Manager -> A user interface used to view and edit thecontents of the Repository.
DataStage Administrator -> A user interface used to perform administrationtasks such as setting up DataStage users, creating and moving projects, and setting up purging criteria.
Repository -> A central store that contains all the informationrequired to build a data mart or data warehouse.
DataStage Server -> Runs executable jobs that extract, transform,and load data into a data warehouse.
Datastage Package Installer -> A user interface used to install packagedDataStage jobs and plug-ins.
Basic type of DataStage
![Page 18: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/18.jpg)
Server Jobs ->These are compiled and run on the DataStage server.
A server job will connect to databases on other machines as necessary,
extract data, process it, then write the data to the target data
warehouse.Parallel Jobs -> These are compiled and run on the DataStage server
in a similar way to server jobs, but support parallel processing on
SMP, MPP, and cluster systems.MainFrame Jobs -> These are available only if you have Enterprise
MVS Edition installed. A mainframe job is compiled and run on
the mainframe. Data extracted by such jobs is then loaded into the
data warehouse.Shared Containers -> These are reusable job elements. They typically
comprise a number of stages and links. Copies of shared containers
can be used in any number of server jobs or parallel jobs and edited
as required. Job Sequences -> A job sequence allows you to specify a sequence of
DataStage jobs to be executed, and actions to take depending on
results.Built in Stages -> Supplied with DataStage and used for extracting,
aggregating, transforming, or writing data. All types of job have
these stages.Plug in Stages-> Additional stages that can be installed in DataStage
to perform specialized tasks that the built-in stages do not support.
Server jobs and parallel jobs can make use of these.Job Sequences Stages-> Special built-in stages which allow you to
define sequences of activities to run. Only Job Sequences have
these.
DataStage has built-in National Language Support (NLS). With NLS installed,
DataStage can do the following:
• Process data in a wide range of languages
• Accept data in any character set into most DataStage fields
• Use local formats for dates, times, and money (server jobs)
![Page 19: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/19.jpg)
• Sort data according to local rules
• Convert data between different encodings of the same language
(for example, for Japanese it can convert JIS to EUC)
A job consists of stages linked together which describe the flow of data
from a data source to a data target (for example, a final data warehouse).
The different types of job have different stage types. The stages that are
available in the DataStage Designer depend on the type of job that is
currently open in the Designer.
A passive stage handlesaccess to databases for the extraction or writing of data.
Active stagesmodel the flow of data and provide mechanisms for combining datastreams, aggregating data,
and converting data from one data type to another.
Database
ODBC. -> Extracts data from or loads data into databases that support the industry standard Open Database
Connectivity API. -> This stage is also used as an intermediate stage for aggregating data. This is a passive stage.
UniVerse. -> Extracts data from or loads data into UniVerse databases. This stage is also used as an intermediate
stage for aggregating data. This is a passive stage.
UniData. -> Extracts data from or loads data into UniData databases. This is a passive stage.
Oracle 7 Load. - > Bulk loads an Oracle 7 database. Previously known as ORABULK.
Sybase BCP Load. - >Bulk loads a Sybase 6 database. Previously known as BCPLoad.
FileHashed File. -> Extracts data from or loads data into databases
that contain hashed files. Also acts as an
intermediate stage for quick lookups. This is a passive stage.
![Page 20: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/20.jpg)
Sequential File. -> Extracts data from, or loads data into,
operating system text files. This is a passive stage.
ProcessingAggregator.-> Classifies inc oming data into groups,
computes totals and other summary functions for each
group, and passes them to another stage in the job. This
is an active stage.BASIC Transformer. -> Receives incoming data, transforms
it in a variety of ways, and outputs it to another
stage in the job. This is an active stage.
Folder. -> Folder stages are used to read or write data as
files in a directory located on the DataStage server.Inter-process. ->Provides a communication channel
between DataStage processes running simultaneously in
the same job. This is a passive stage.Link Partitioner. -> Allows you to partition a data set into
up to 64 partitions. Enables server jobs to run in parallel
on SMP systems. This is an active stage.Link Collector. -> Collects partitioned data from up to 64
partitions. Enables server jobs to run in parallel on SMP
systems. This is an active stage.
RealTimeRTI Source. -> Entry point for a Job exposed as an RTI
service. The Table Definition specified on the output link
dictates the input arguments of the generated RTI
service.RTI Target. -> Exit point for a Job exposed as an RTI
service. The Table Definition on the input link dictates
the output arguments of the generated RTI service.
Containers
![Page 21: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/21.jpg)
Server Shared Container. -> Represents a group of stages
and links. The group is replaced by a single Shared
Container stage in the Diagram window.Local Container. -> Represents a group of stages and links.
The group is replaced by a single Container stage in the
Diagram window
Container Input and Output. -> Represent the interface
that links a container stage to the rest of the job design.
DataBases
DB2/UDB Enterprise. Allows you to read and write a
DB2 database.
Informix Enterprise. Allows you to read and write an
Informix XPS database.
Oracle Enterprise. Allows you to read and write an
Oracle database.
Teradata Enterprise. Allows you to read and write a
Teradata database.
Development/Debug StagesRow Generator. -> Generates a dummy data set.
Column Generator. -> Adds extra columns to a data set.
Head. -> Copies the specified number of records from
the beginning of a data partition.Peek. -> Prints column values to the screen as records are
copied from its input data set to one or more output
data sets.
Sample. -> Samples a data set.
Tail. -> Copies the specified number of records from the
end of a data partition.
Write range map. -> Enables you to carry out range map
partitioning on a data set.
![Page 22: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/22.jpg)
File Stages
Complex Flat File. -> Allows you to read or writecomplex flat files on a mainframe machine. This isintended for use on USS systemsData set.-> Stores a set of data.
External source. -> Allows a parallel job to read anexternal data source.
External target. -> Allows a parallel job to write to anexternal data source.
File set. -> A set of files used to store data.
Lookup file set. ->Provides storage for a lookup table.
SAS data set. -> Provides storage for SAS data sets.
Sequential file. -> Extracts data from, or writes data to, atext file.
Processing Stages
Transformer. - >Receives incoming data, transforms it in avariety of ways, and outputs it to another stage in thejob.Aggregator. -> Classifies incoming data into groups,
computes totals and other summary functions for each
group, and passes them to another stage in the job.
Change apply. -> Applies a set of captured changes to a data set.
Change Capture. -> Compares two data sets and recordsthe differences between them.
Compare. -> Performs a column by column compare oftwo pre-sorted data sets.
Compress. -> Compresses a data set.
Copy . -> Copies a data set.
Decode. -> Uses an operating system command to decodea previously encoded data set.
Difference. -> Compares two data sets and works out the difference between them.
Encode. -> Encodes a data set using an operating systemcommand.
Expand. -> Expands a previously compressed data set.
External Filter. -> Uses an external program to filter a dataset.
Filter. -> Transfers, unmodified, the records of the inputdata set which satisfy requirements that you specify, andfilters out all other records.
Funnel. -> Copies multiple data sets to a single data set.
Generic. -> Allows Orchestrate experts to specify their owncustom commands.
Lookup. -> Performs table lookups.
Merge.-> Combines data sets.
Modify. -> Alters the record schema of its input data set.
Remove duplicates.-> Removes duplicate entries from adata set.
![Page 23: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/23.jpg)
SAS(Statistical Analysis System)-> Allows you to run SAS applications from
within
Sort. -> Sorts input columns.
Switch. -> Takes a single data set as input and assigns eachinput record to an output data set based on the value of aselector field.
Surrogate Key.-> Generates one or more surrogate keycolumns and adds them to an existing data set.
Real TimeRTI Source. -> Entry point for a Job exposed as an RTI
service. The Table Definition specified on the output link
dictates the input arguments of the generated RTI
service.RTI Target. -> Exit point for a Job exposed as an RTI
service. The Table Definition on the input link dictates
the output arguments of the generated RTI service.
Restructure
Column export. -> Exports a column of another type to astring or binary column.
Column import. -> Imports a column from a string orbinary column.
Combine records. -> Combines several columns associatedby a key field to build a vector.
Make subrecord. -> Combines a number of vectors to forma subrecord.
Make vector. -> Combines a number of fields to form avector.
Promote subrecord. -> Promotes the members of asubrecord to a top level field.
Split subrecord. -> Separates a number of subrecords intotop level fields.
Split vector. -> Separates a number of vector members intoseparate columns.
Other StagesParallel Shared Container. -> Represents a group of stages
and links. The group is replaced by a single Parallel
Shared Container stage in the Diagram window. Parallel
Shared Container stages are handled differently to other
stage types, they do not appear on the palette.
![Page 24: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/24.jpg)
Local Container. -> Represents a group of stages and links.
The group is replaced by a single Container stage in the
Diagram window
Container Input and Output. -> Represent the interface
that links a container stage to the rest of the job design.
Links join the various stages in a job together and are used to specify how
data flows when the job is run.
Linking Server Stages - >Stream. A link representing the flow of data. This is the principal
type of link, and is used by both active and passive stages.Reference. A link representing a table lookup. Reference links are
only used by active stages. They are used to provide information
that might affect the way data is changed, but do not supply the
data to be changed.
Linkning Parallel Stages ->Stream. -> A link representing the flow of data. This is the principal
type of link, and is used by all stage types.Reference.-> A link representing a table lookup. Reference links can
only be input to Lookup stages, they can only be output from
certain types of stage.
Reject. -> Some parallel job stages allow you to output records that
have been rejected for some reason onto an output link.
Parallel processing is the ability to carry out multiple operations or tasks simultaneously.
![Page 25: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/25.jpg)
Pipeline Parallelism
->If we run a job on a system with at least three processors the stage reading
would start on one processor and start filling a pipeline with the data it had
read.
->The transformation stage would start running on second processor as soon
as there was a data in a pipeline, process it and start filling another pipeline.
->The target stage would start running on 3rd processor as soon as there was
Partitioning Parallelism
-> Using Partitioning Parallelism the same job would effectively be run on
simultaneously by several processors.
BULK COPY PROGRAM: Microsoft SQL Server and Sybase have a utility called
BCP (Bulk Copy
Program). This command line utility copies SQL Server data to or from an
operating system file in a user-specified format. BCP uses the bulk copy
API in the SQL Server client libraries.
By using BCP, you can load large volumes of data into a table without
recording each insert in a log file. You can run BCP manually from a
command line using command line options (switches). A format (.fmt) file
The Orabulk stage is a plug-in stage supplied by Ascential. The Orabulk
plug-in is installed automatically when you install DataStage.
An Orabulk stage generates control and data files for bulk loading into a
single table on an Oracle target database. The files are suitable for loading
into the target database using the Oracle command sqlldr.
One input link provides a sequence of rows to load into an Oracle table.
The meta data for each input column determines how it is loaded. One
optional output link provides a copy of all input rows to allow easy
combination of this stage with other stages.
![Page 26: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/26.jpg)
Lookup and join perform equivalent operations: combining two or more
input datasets based on one or more specified keys.
Lookup requires all but one (the first or primary) input to fit into physical
memory. Join requires all inputs to be sorted.
When one unsorted input is very large or sorting isn’t feasible, lookup is
the preferred solution. When all inputs are of manageable size or are
presorted,
These are the temporary variables created in transformer for calculation.
Routines are the functions which we develop in BASIC Code for required
tasks, which we Datastage is not fully supported (Complex).
These Parameters are used to provide Administrative access and change run
time values of the job. EDIT > JOBPARAMETERSIn that
Parameters Tab we can define the name,prompt,type,value.
Stage Variable - An intermediate processing variable that retains value during
read and does not pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target
column.
A fact table consists of measurements of business requirements and foreign
keys of dimensions tables as per business rules.
An entity represents a chunk of information. In relational databases, an entity
often maps to a table.
An attribute is a component of an entity and helps define the uniqueness of
the entity. In relational databases, an attribute maps to a column.
MetaStage is a persistent metadata Directory that uniquely synchronizes
metadata across multiple separate silos, eliminating re keying and the manual
establishment of cross-tool relationships. Based on patented technology, it
provides seamless cross-tool integration throughout the entire Business
![Page 27: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/27.jpg)
Four Places u can call
(i) Transform of routine
(A) Date Transformation
(B) Upstring Transformation
(ii) Transform of the Before & After Subroutines
(iii) XML transformation
(iv)Web base trannsformation
DataStage provides a set of variables containing useful system information
that you can access from a transform or routine. System variables are read-
only.
@DATE The internal date when the program started. See the Date function.
@DAY The day of the month extracted from the value in @DATE.
@FALSE The compiler replaces the value with 0.
@FM A field mark, Char(254).
@IM An item mark, Char(255).
@INROWNUM Input row counter. For use in constrains and derivations in
Transformer stages.
@OUTROWNUM Output row counter (per link). For use in derivations in
Transformer stages.
![Page 28: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/28.jpg)
Autosys, TNG, event coordinator,Maestro Schedular,Contl-M job schedular
are the third party Tool.which are being used in datatstage projects
Change capture stage is used to get the difference between two sources i.e.
after dataset and before dataset. The source which is used as a reference to
capture the changes is called after dataset. The source in which we are
looking for the change is called before dataset. This change capture will add
one field called "chage code" in the output from this stage. By this change
code one can recognize which kind of change this is like whether it is delete,
insert or update.
the following commands can be taken as DS Engine commands, used to start
and stop the DS Engine
DSHOME/bin/uv -admin -start
Routines are used to return the values ,transform cannot return the values
In datastage Director!
Look at the datastage director Log
A complex flat file can be used to read the data at the intial level. By using
CFF, we can read ASCII or EBCDIC (Extended Binary coded Decimal Interchage
Code) data. We can select the required columns and can omit the remaining.
We can collect the rejects (bad formatted records) by setting the property of
![Page 29: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/29.jpg)
Ad hoc querying is a term in information science. Many application software
systems have an underlying database which can be accessed by only a limited
number of queries and reports. Typically these are available via some sort of
menu, and will have been carefully designed, pre-programmed and optimized
for performance by expert programmers.
By contrast, "ad hoc" reporting systems allow the users themselves to create
specific, customized queries. Typically this would be via a user-friendly GUI-
based system without the need for the in-depth knowledge of SQL, or
database schema that a programmer would have.
Because such reporting has the potential to severely degrade the
Version Control allows you to:
• Store different versions of DataStage jobs.
• Run different versions of the same job.
• Revert to a previous version of a job.
• View version histories.
• Ensure that everyone is using the same version of a job.
• Protect jobs by making them read-only.
• Store all changes in one centralized place.
Version Control utilizes the DataStage repository, and uses a specially
created DataStage project (normally called ‘VERSION’) to store its
information.
This special project stores all changes made to all the projects and
Version Control is effective because it captures entire component releases,
making it possible to view all changes between release levels.
Version Control also provides these benefits:
• Version tracking
• Central code repository
• DataStage integration
• Team coordination
![Page 30: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/30.jpg)
Two types of Lookup: Range Lookup and Caseless Lookup
![Page 31: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/31.jpg)
This database can be accessed by all users, ensuring that each group in an organization
Because operational databases are task oriented, for example, stock inventory
Data extraction is the process used to obtain data from operational sources, archives, and
Transformation is the process that converts data to a required definition and value.
example, product codes can be mapped to a common format using a transformation
![Page 32: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/32.jpg)
• Reduces the strain on IT departments, as they can produce one database to serve all user groups
• Allows new reports and studies to be introduced without disrupting operational systems
• Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data.
You can easily extend the functionality by defining your own transforms to use.
DataStage server and parallel jobs are compiled and run on the DataStage server. The job will connect to databases on other machines as necessary,
DataStage mainframe jobs are compiled and run on a mainframe. Data extracted by such jobs is then loaded into the data warehouse.
DataStage Designer -> A design interface used to create DataStageapplications (known as jobs).
DataStage Director-> A user interface used to validate, schedule,run, and monitor DataStage server jobs and parallel jobs.
DataStage Manager -> A user interface used to view and edit thecontents of the Repository.
DataStage Administrator -> A user interface used to perform administrationtasks such as setting up DataStage users,
Repository -> A central store that contains all the informationrequired to build a data mart or data warehouse.
DataStage Server -> Runs executable jobs that extract, transform,and load data into a data warehouse.
Datastage Package Installer -> A user interface used to install packagedDataStage jobs and plug-ins.
![Page 33: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/33.jpg)
![Page 34: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/34.jpg)
A passive stage handlesaccess to databases for the extraction or writing of data.
Active stagesmodel the flow of data and provide mechanisms for combining datastreams, aggregating data,
ODBC. -> Extracts data from or loads data into databases that support the industry standard Open Database
Connectivity API. -> This stage is also used as an intermediate stage for aggregating data. This is a passive stage.
UniVerse. -> Extracts data from or loads data into UniVerse databases. This stage is also used as an intermediate
UniData. -> Extracts data from or loads data into UniData databases. This is a passive stage.
Oracle 7 Load. - > Bulk loads an Oracle 7 database. Previously known as ORABULK.
Sybase BCP Load. - >Bulk loads a Sybase 6 database. Previously known as BCPLoad.
![Page 35: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/35.jpg)
![Page 36: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/36.jpg)
![Page 37: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/37.jpg)
Complex Flat File. -> Allows you to read or writecomplex flat files on a mainframe machine. This isintended for use on USS systems
Transformer. - >Receives incoming data, transforms it in avariety of ways, and outputs it to another stage in thejob.
Change Capture. -> Compares two data sets and recordsthe differences between them.
Compare. -> Performs a column by column compare oftwo pre-sorted data sets.
Decode. -> Uses an operating system command to decodea previously encoded data set.
Difference. -> Compares two data sets and works out the difference between them.
Filter. -> Transfers, unmodified, the records of the inputdata set which satisfy requirements that you specify, andfilters out all other records.
Generic. -> Allows Orchestrate experts to specify their owncustom commands.
![Page 38: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/38.jpg)
Switch. -> Takes a single data set as input and assigns eachinput record to an output data set based on the value of aselector field.
Surrogate Key.-> Generates one or more surrogate keycolumns and adds them to an existing data set.
Column export. -> Exports a column of another type to astring or binary column.
Combine records. -> Combines several columns associatedby a key field to build a vector.
Promote subrecord. -> Promotes the members of asubrecord to a top level field.
Split vector. -> Separates a number of vector members intoseparate columns.
![Page 39: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/39.jpg)
Parallel processing is the ability to carry out multiple operations or tasks simultaneously.
![Page 40: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/40.jpg)
Job Sequence?
Activity Stages?
JOB SEQUENCE
![Page 41: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/41.jpg)
Triggers?
Job Sequence Properties?
Job Report
![Page 42: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/42.jpg)
How do you generate Sequence number in
Datastage?
Sequencers are job control programs that
execute other jobs with preset Job parameters.
![Page 43: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/43.jpg)
DataStage provides a graphical Job Sequencer which allows you to specify
a sequence of server jobs or parallel jobs to run. The sequence can also
contain control information; for example, you can specify different courses
of action to take depending on whether a job in the sequence succeeds or
fails. Once you have defined a job sequence, it can be scheduled and run
using the DataStage Director. It appears in the DataStage Repository and
in the DataStage Director client as a job.
• Job. Specifies a DataStage server or parallel job.
• Routine. Specifies a routine. This can be any routine in
the DataStage Repository (but not transforms).
• ExecCommand. Specifies an operating system command
to execute.
• Email Notification. Specifies that an email notification
should be sent at this point of the sequence (uses SMTP).
• Wait-for-file. Waits for a specified file to appear or disappear.
• Exception Handler. There can only be one of these in a
job sequence. It is executed if a job in the sequence fails to
run (other exceptions are handled by triggers) or if the
job aborts and the Automatically handle job runs that
fail option is set for that job.
• Nested Conditions. Allows you to further branch the
execution of a sequence depending on a condition.
• Sequencer. Allows you to synchronize the control flow
of multiple activities in a job sequence.
• Terminator. Allows you to specify that, if certain situations
occur, the jobs a sequence is running shut down
cleanly.
• Start Loop and End Loop. Together these two stages
allow you to implement a For…Next or For…Each loop
within your sequence.
• User Variable. Allows you to define variables within a
sequence. These variables can then be used later on in
the sequence, for example to set job parameters.
JOB SEQUENCE
![Page 44: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/44.jpg)
The control flow in the sequence is dictated by how you interconnect
activity icons with triggers.
There are three types of trigger:
• Conditional. A conditional trigger fires the target activity if the
source activity fulfills the specified condition. The condition is
defined by an expression, and can be one of the following types:
– OK. Activity succeeds.
– Failed. Activity fails.
– Warnings. Activity produced warnings.
– ReturnValue. A routine or command has returned a value.
– Custom. Allows you to define a custom expression.
– User status. Allows you to define a custom status message to
write to the log.
• Unconditional. An unconditional trigger fires the target activity
once the source activity completes, regardless of what other triggers
are fired from the same activity.
• Otherwise. An otherwise trigger is used as a default where a
source activity has multiple output triggers, but none of the conditional
ones have fired.
General,Parameters,Job Control,Dependencies,NLS
The job reporting facility allows you to generate an HTML report of a
server, parallel, or mainframe job or shared containers. You can view this
report in a standard Internet browser (such as Microsoft Internet Explorer)
and print it from the browser.
The report contains an image of the job design followed by information
about the job or container and its stages. Hotlinks facilitate navigation
through the report. The following illustration shows the first page of an
example report, showing the job image and the contents list from which
you can link to more detailed job component descriptions: The
report is not dynamic, if you change the job design you will need to
regenerate the report.
![Page 45: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/45.jpg)
Using the Routine
KeyMgtGetNextVal
KeyMgtGetNextValConn
They can also be done by Oracle Sequence.
A sequencer allows you to synchronize the control flow of multiple activities in a job
sequence. It can have multiple input triggers as well as multiple output triggers.The
sequencer operates in two modes:ALL mode. In this mode all of the inputs to the
sequencer must be TRUE for any of the sequencer outputs to fire.ANY mode. In this
mode, output triggers can be fired if any of the sequencer inputs are TRUE
![Page 46: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/46.jpg)
if suppose we have 3 jobs in sequencer, while running
if job1 is failed then we have to run job2 and job 3
,how we can run?
how do you remove duplicates using transformer
stage in datastage.
how you will call shell scripts in sequencers in
datastage
What are the Environmental variables in Datastage?
How to extract job parameters from a file?
Scenarios
![Page 47: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/47.jpg)
How to get the unique records on multiple columns by
using sequential file stage only
if a column contains data like
abc,aaa,xyz,pwe,xok,abc,xyz,abc,pwe,abc,pwe,xok,xyz
,xxx,abc,
roy,pwe,aaa,xxx,xyz,roy,xok....
how to send the unique data to one source and
remaining data
to another source????
how do u reduce warnings?
Is there any possibility to generate alphanumeric
surrogate key?
How to lock\unlock the jobs as datastage admin?
How to enter a log in auditing table whenever a job
get finished?
what is Audit table? Have u use audit table in ur
project?
![Page 48: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/48.jpg)
Can we use Round Robin for aggregator? Is there any
benefit underlying?
How many number of reject links merge stage can
have?
I have 3 jobs A,B and C , which are dependent each
other. I want to run A & C jobs daily and B job run only
on sunday. how can we do it?
How to generate surrogate key without using
surrogate key stage?
what is push and pull technique??? I want to two seq
files using push technique import in my desktop what i
will do?
what is .dsx files
how to capture rejected data by using join stage not
for lookup stage. please let me know?
What is APT_DUMP_SCORE?
Country, state 2 tables r there. in table 1 have
cid,cname
table2 have sid,sname,cid. i want based on cid which
country's
having more than 25 states i want to display?
![Page 49: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/49.jpg)
what is the difference between 7.1,7.5.2,8.1 versions
in datastage?
what is normalization and denormalization?
What is diff between Junk dimensions and conform
dimension?
30 jobs are running in unix.i want to find out my
job.how to do this?Give me command?
How do u convert the columns to rows in DataStage?
What is environment variables?
Where the DataStage stored his repository?
How one source columns or rows to be loaded in to
two different tables?
How do you register plug-ins?
![Page 50: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/50.jpg)
How many number of ways that you can implement
SCD2 ? Explain them
A sequential file has 8 records with one column, below
are the values in the column separated by space,1 1 2
2 3 4 5 6in a parallel job after reading the sequential
file 2 more sequential files should be created, one
with duplicate records and the other without
duplicates.File 1 records separated by...
how to perform left outer join and right outer join in
lookup stage
what are the ways to read multiiple files from
sequential file if the both files are different
What happens if the job fails at night?
![Page 51: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/51.jpg)
If there are 10000 records and while loading, if the
session fails in between, how will you load the
remaining data?
Tell me one situation from your last project, where
you had faced problem and How did u solve it?
How to handle Date convertions in Datastage?
Convert a mm/dd/yyyy format to yyyy-dd-mm?
![Page 52: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/52.jpg)
what is trouble shooting in server jobs ? what are the
diff kinds of errors encountered while running any
job?
what are validations you perform after creating jobs in
designer.what r the different type of errors u faced
during loading and how u solve them
If the size of the Hash file exceeds 2GB..What
happens? Does it overwrite the current rows?
What is the purpose of Debugging stages? In real time
Where we will use?
![Page 53: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/53.jpg)
How do you you delete header and footer on the
source sequential file and how do you create header
and footer on target sequential file using datastage?
Using server job, how to transform data in XML file
into sequential file?? i have used XML input, XML
transformer and a sequential file.
How to develop the SCD using LOOKUP stage?
![Page 54: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/54.jpg)
source has 10000 records, Job failed after 5000
records are loaded. This status of the job is abort ,
Instead of removing 5000 records from target , How
can i resume the load
if we using two sources having same meta data and
how to check the data in two sorces is same or
not?and if the data is not same i want to abort the job
?how we can do this?
![Page 55: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/55.jpg)
Scenario based Question ........... Suppose that 4 job
control by the sequencer like (job 1, job 2, job 3, job 4
)if job 1 have 10,000 row ,after run the job only 5000
data has been loaded in target table remaining are not
loaded and your job going to be aborted then.. How
can short out the problem.
Tell me the environment in your last projects
Give the OS of the Server and the OS of the Client of
your recent most project
Where does UNIX script of datastage executes
weather in client machine or in server.Suppose if it
executes on server then it will execute ?
What are the Repository Tables in DataStage and
What are they?
![Page 56: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/56.jpg)
How the hash file is doing lookup in serverjobs?How is
it comparing the key values?
how to extract data from more than 1 hetrogenious
Sources.
mean, example 1 sequenal file, Sybase , Oracle in a
single Job.
how can you do incremental load in datastage?
Job run reports generated by sequence jobs do not
show the final error message
![Page 57: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/57.jpg)
To run a job even if its previous job in the sequence is failed you need to go to the TRIGGER tab of
that particular job activity in the sequence itself.
There you will find three fields:
Name: This is the name of the next link (link goin to the next job, e.g. for job activity 1 link name
will be the link goin to job activity 2).
Expression Type: This will allow you to trigger your next job activity based on the status you
want. For example, if in case job 1 fails and you want to run the job 2 and job 3 then go to trigger
properties of the job 1 and select expression type as "Failed - (Conditional)". This way you can run
your job 2 even if your job 1 is aborted. There are many other options available.
Expression: This is editable for some options. Like for expression type "Failed" you can not
change this field.
I think this will solve your problem.
In that Time double click on transformer stage---> Go to Stage properties(its having in hedder line
first icon) ---->double click on stage properties --->Go to inputs ---->go to partitioning---->select one
partition technick(with out auto)--->now enable perform sort--->click on perfom sort----> now
enable unique---->click on that and we can take required colum name. now out put will come unique
values so here duplicats will be removed.
Shell scripts can be called in the sequences by using "Execute command activity". In this activity
type following command :
bash /path of your script/scriptname.sh
bash command is used to run the shell script.
The Environmental variables in datastage are some pathes which can support system can use as
shortcuts to fulfill the program running instead of doing nonsense activity. In most time,
environmental variables are defined when the software have been installed or being installed.
Could we use dsjob command on linux or unix platform to achive the activity of extacting
parameters from a job?
Scenarios
![Page 58: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/58.jpg)
In sequential file there is one option is there i.e filter.in this filter we use unix commands like what
ever we want. Goto Seq Properties -> Output -> Option ->set Filter
By Using Sort Stage. GoTo Properties -> set Sorting Keys key=column name and set option Allow
Duplicate= false.
In order to reduce the warnings you need to get clear idea
about particular warning, if you get any idea on code or
design side you fix it, other wise goto director-->select warning and right click and add rule to
message, then click ok. from next run onward you shouldn't find any warnings.
It is not possible to generate alphanumeric surrogate key
in datastage.
I think this answer might satisfy you..
1.just open administrator
2.Go to projects tab
3.click on command button.
4.Give list.readu command and press execute(It gives you all the jobs status
and please not the PID(Process ID) of those jobs which you want to unlock)
5.Now close that and again come back to command window.
6.now give the command ds.tools and execute
7.read the options given there.... and type "4" (option)
8.and now give 6/7 depending up on ur requirement...
9.Now give the PID that you have noted before..
10.Then "yes"
11.Generally at first time it won't work.. but if we press again 7 then
after that give PID again.. It ll work....
Please get back to me If any further clarifications req
some companies using shell script to load logs into audit table or some companies load logs into
audit table using datastage jobs. These jobs are we developed.
Audit table mean its log file.in every job should has audit
table.
![Page 59: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/59.jpg)
Yes we can use Round Robin in Aggregator. It is used for Partitioning and Collecting.
we can have n-1 rejects for merge.
First you have to schedule A & C jobs Monday to Saturday in one sequence.Next take three jobs
according to dependency in one more sequence and schedule that jobonly Sunday.
by using the transformer we can do it.To generate seqnum
there is a formula by using the system variables
ie [@partation num + (@inrow num -1) * @num partation. OR
@PARTITIONNUM+ @INROWNUM- @NUMPARTITIONS
push means the source team sends the data and pull means
the developer extracts the data from source.
.dsx file is nothing but the datastage project backup file..
when we want to load the project at the another system or server we take the file and load at the
other system/server.
We can not capture the reject data by using join stage.
For that we can use transformer stage after join stage.
APT_DUMP_SCORE is an reporting environment variable , used to show how the data is processing
and processes are combining.
Join these two tables on cid and get all the columns to
output. Then in aggregator stage, count rows with key
collumn cid..Then use filter or transformer to get records
with count> 25
![Page 60: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/60.jpg)
The main difference is in 7.5 we can open job only once at a system but in 8.1 we can open on job in
multiple time as a read only mode and another difference is in 8.1 having Slowly Changing
Dimention stage and Repository are there in 8.1.
IN Normalization is controlled by elimination redundant
data where as in Denormalisation is controlled by redundant
data.
JUNK DIMENSION
A Dimension which cannot be used to describe the facts is
known as junk dimension(junk dimension provides additional
information to the main dimension)
ex:-customer add
Confirmed Dimension
A dimension table which can be shared by multiple fact tables
is known as Confirmed dimension
Ex:- Time dimension
ps -ef|grep USER_ID|grep JOB_NAME
Using Pivot Stage .
Basically Environment variable is predefined variable those we can use while
creating DS job.We can set eithere as Project level or Job level.Once we set
specific variable that variable will be availabe into the project/job.
DataStage stored his repository in IBM Universe Database.
For Columns - We can directly map the single source columns to two different
targets.
For Rows - We have to put some constraint (condition ).
Using DataStage Manager. Tool-> Register Plugin -> Set Specific path and ->ok
![Page 61: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/61.jpg)
3 ways to construct the scd2 in datastage 8.0.1
1)using SCD stage in processing stage
2)using change capture and change applay stages
3)using source file,lookup,transformers,filters,surrogate key gene...
Hi,we are having the data1122345By using sort we can send the
duplicates into one link and non-duplicates into another link.In sort by
using keychange column we can identify the duplicates .By using
transformer
In Lookup stage properties, you will have constraints option. If you click on
constraints button- you will get options like continue, drop, fail and reject
If you select the option continue: it means left outer join operation will be
performed.
If you select the option drop: it means inner join operation will be performed.
This can be achieved by selecting the File pattern option and the path of the the
files in the sequential stage.
U can define a job sequence to send an email using SMTP activity if the job fails.
Or log the failure to a log file using DSlogfatal/DSLogEvent from controlling job
or using a After Job Routine.
or
Use dsJob -log from CLI.
![Page 62: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/62.jpg)
Different companies use different strategies to recover the workflows.
1) You can use the session properties to Recover from last check point.
2) Use a temporary table before every target and load it with the keys. When a
job fails,
You can identify the rows that are not loaded from the source by using these
keys in SQL override
3) You can delete the rows that are loaded in to the target by date, and restart
the job from begining.
a) We had a big job with around 40 stages.The job was taking too long tocompile
and run.We broke the job into 3 smaller jobs.After this ,we observed that the
performance was slighly improved and maintenance of the jobs became easier.
b) We were facing problems in deleting the records using OEE stage.We wrote a
bulk delete statment instead of record by record delete.it improved the
performance of our job and the deletion time reduced to 5 minutes.Earlier the
same job was taking 25 minutes.
etc..
I will explain how to Convert a mm/dd/yyyy format to yyyy-dd-mm
Below is the format
Oconv(Iconv(Filedname D/MDY[2 2 4] ) D-YDM[4 2 2] )
here first Iconv(Filedname D/MDY[2 2 4] ) will convert our given date in the
internal format
later Oconv( Inter_date_format D-YDM[4 2 2] ) will convert our internal date
format to required yyyy-dd-mm...
![Page 63: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/63.jpg)
Troubleshooting in datastage server jobs involves monitoring the job log for fatal
errors and taking appropriate actions to resolve them.There can be various
errors which could be encountered while running the ds jobs.Some are
following:
a) Ora-1400 error
b) Invalid userid or password.login denied(From OCI stage)
c) error - Dataset does not exist. (parallel jobs)
d) Job may fail for lookup failure saiyng -- lookup failed on a key column.(If
"failure" setting is done in lookup stage for lookup failures.) etc....
I performed the following validations:
1)all letters should be in smallcase
2)email id field should not contain more than 255 characters
3)it should not contain special characters except underscore
While loading sometimes i came across to the following errors:
1)"unknown field name....." because metadata was not properly loaded..i
reloaded the data and it worked fine...
2)"data truncation warning"..bcoz in data stage data type size was less than the
size of data type in database
When you create hash file, by default in that directory we will have 2 files
data.30
over.30
If data has exceed the specified limit, extra data will be written into over.30.
It again depends up on storage capacity.
The main use of Debugging Stages(row gen,peak,tail,head etc) are they are help
full to monitor jobs,and they generate mock data wen we dont have real time
data to test
![Page 64: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/64.jpg)
In Designer Pallete Development/Debug we can find Head & tail. By using this
we can do......
I will explain u the stages used inorder..
FOLDER STAGE--------->XMLINPUTSTAGE--------->TRANSFORMER------
>SEQUENTIAL FILE
folder stage is to check for the folder which has xmlfile and u have to give
wildcard as .xml
in XML inputstage load the columns from the xml importer and select only the
values and map the same in transformer.thats it
we can impliment SCD by using LOOKUP stage, but it is for only scd1, not for
scd2.
we have to take source(file or db) and dataset as a ref link(for look up) and then
LOOKUP stage, in this we have to compare the source with dataset and we have
to give condition as continue, continue there. after that in t/r we have to give
the conditon, after that we have to take two targets for insert and update, there
we have to manually write the sql insert and update statements.
If u see the design, then u can easily understand that.
![Page 65: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/65.jpg)
But we keep the Extract , Transform and Load proess seperately.
Generally only load job never failes unles there is a data issue.
All data issues are cleared before in trasform only.
there are some DB tools that do this automatically
If you want to do this manually. Keep track of number of records in a has file or
test file.
Update the file as you insert the record.
if job failed in the middle then read the number from the file and process the
records from there only ignoring the record numbers before that
try @INROWNUM function for better result.
Use a change Capture Stage.Output it into a Transformer.
Write a routine to abort the job which is initiated at the Function.
@INROWNUM = 1.
So if the data is not matching it is passed in the transformer and the job is
aborted.
![Page 66: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/66.jpg)
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in
this condition should go director and check it what type of problem showing
either data type problem, warning massage, job fail or job aborted, If job fail
means data type problem or missing column action .So u should go Run window -
>Click-> Tracing->Performance or In your target table ->general -> action-> select
this option here two option
(i) On Fail -- commit , Continue
(ii) On Skip -- Commit, Continue.
First u check how many data already load after then select on skip option then
continue and what remaining position data not loaded then select On Fail ,
Continue ...... Again Run the job defiantly u get successful massage
server is unix and client machine i.e is ur machine where u design a job is
windows xp professional
Datastage jobs are executed in the server machines only. There is nothing that is
stored in the client machine.
A datawarehouse is a repository(centralized as well as distributed) of Data, able
to answer any adhoc,analytical,historical or complex queries.Metadata is data
about data. Examples of metadata include data element descriptions, data type
descriptions, attribute/property descriptions, range/domain descriptions, and
process/method descriptions. The repository environment encompasses all
corporate metadata resources: database catalogs, data dictionaries, and
navigation services. Metadata includes things like the name, length, valid values,
and description of a data element. Metadata is stored in a data dictionary and
repository. It insulates the data warehouse from changes in the schema of
operational systems.In data stage I/O and Transfer , under interface tab: input ,
out put & transfer pages.U will have 4 tabs and the last one is build under that u
can find the TABLE NAME .
![Page 67: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/67.jpg)
The DataStage client components are:AdministratorAdministers DataStage
projects and conducts housekeeping on the serverDesignerCreates DataStage
jobs that are compiled into executable programs DirectorUsed to run and
monitor the DataStage jobsManagerAllows you to view and edit the contents of
the repository.
Hashed File is used for two purpose: 1. Remove Duplicate Records 2. Then Used
for reference lookups.The hashed file contains 3 parts: Each record having
Hashed Key, Key Header and Data portion.By using hashed algorith and the key
valued the lookup is faster.
U can convert all hetrogenous sources into sequential files & join them using
merge
or
U can write user defined query in the source itself to join them
Incremental load means daily load.
when ever you are selecting data from source, select the records which are
loaded or updated between the timestamp of last successful load and todays
load start date and time.
for this u have to pass parameters for those two dates.
store the last run date and time in a file and read the parameter through job
parameters and state second argument as currentdate and time.
BM® InfoSphere™ DataStage®: A sequence job collects job run information after
each job activity is run. This information can be written to the job log or sent by
email using the Notification Activity stage. If any stages or links in a job activity
produce warning or error messages from the job run, the last warning or error
message is retrieved and added to the report.
![Page 68: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/68.jpg)
What is DatawareHouse? Concept of Dataware
house?
What type of data available in Datawarehouse?
What is Node? What is Node Configuration?
What are the types of nodes in datastage?
DataStage Important Interview Question And Answer
![Page 69: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/69.jpg)
What is the use of Nodes
Fork-join
Execution flow
Conductor
Section
Player
![Page 70: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/70.jpg)
What are descriptor file and data file in Dataset.
What is Job Commit ( in Datastage).
What is Iconv and Oconv functions
How to Improve Performance of Datastage Jobs?
![Page 71: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/71.jpg)
Difference between Server Jobs and Parallel Jobs
![Page 72: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/72.jpg)
Difference between Datastage and Informatica.
What is complier ? Compliation Process in
datastage
What is Modelling Of Datastage?
![Page 73: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/73.jpg)
Types Of Modelling ?
What is DataMart, Importance and Advantages?
![Page 74: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/74.jpg)
Data Warehouse vs. Data Mart
What are different types of error in datastage?
![Page 75: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/75.jpg)
What are the client components in DataStage 7.5x2
version?
![Page 76: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/76.jpg)
Difference Between 7.5x2 And 8.0.1?
![Page 77: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/77.jpg)
What is IBM Infosphere? And History
What is Datastage Project Contains?
![Page 78: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/78.jpg)
What is Difference Between Hash And Modulus
Technique?
What are Features of Datastage?
![Page 79: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/79.jpg)
ETL Project Phase?
What is RCP?
![Page 80: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/80.jpg)
What is Roles And Responsibilties of Software
Engineer?
Server Component of DataStage 7.5x2 version?
![Page 81: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/81.jpg)
How to create Group ID in Sort Stage?
What is Fastly Changing Dimension?
Force Compilation ?
![Page 82: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/82.jpg)
how many rows sorted in sort stage by default in
server jobs
when we have to go for a sequential file stage &
for a
dataset in datastage?
what is the diff b/w switch and filter stage in
datastage?
specify data stage strength?
symmetric multiprocessing (SMP)
![Page 83: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/83.jpg)
Briefly state different between data ware house &
data mart?
What are System variables?
What are Sequencers?
Whats difference betweeen operational data stage
(ODS) and data warehouse?
What is the difference between Hashfile and
Sequential File?
![Page 84: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/84.jpg)
What is OCI?
Which algorithm you used for your hashfile?
how to perform left outer join and right outer join
in lookup stage
What is the difference between DataStage and
DataStage Scripting?
Orchestrate Vs Datastage Parallel Extender?
The above might rise another question: why do we
have to load the dimensional tables first, then fact
tables:
![Page 85: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/85.jpg)
how to create batches in Datastage from command
prompt
How will the performance affect if we use more
number of Transformer stages in Datastage parallel
jobs?
![Page 86: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/86.jpg)
What various validations do you perform on the
data after extraction?
what is PROFILE STAGE , QUALITY STAGE,AUDIT
STAGE in datastage..
please expalin in detail.
How do you fix the error "OCI has fetched
truncated data" in DataStage
Why is hash file is faster than sequential file n odbc
stage??
![Page 87: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/87.jpg)
how to fetch the last row from a particular column..
Input file may be sequential file...
What is project life cycle and how do you
implement it?
What is the alternative way where we can do job
control??
![Page 88: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/88.jpg)
It is possible to access the same job two users at a
time in datastage?
How to kill the job in data stage?
What is Integrated & Unit testing in DataStage ?
how do u clean the datastage repository.
give one real time situation where
link partitioner stage used?
![Page 89: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/89.jpg)
what is the transaction size and array size in OCI
stage?how these can be used?
How do you do Usage analysis in datastage ?
![Page 90: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/90.jpg)
Datawarehouse is a database which is used to store the heterogeneous
sources of data with characteristics like
a) Stucture Oriented
b) Historical Information
c) Integrated
d) Non Volatile
e) Time Variant
Source will be Online Transaction Process ( OLTP). It collects the data from
Online Transaction Process ( OLTP). It maintains the data for 30 - 90 days. It is
time sensitive. If we like to store the data for long period, we need a
permanent data base. That is Archyl Database ( AD).
Data in the Datawarehouse comes from the client systems.Data that you are
using to manage your business is very important to do the manupulations
according to the client requirements.
Node is a Logical Cpu in datastage .
Each node in a configuration file is distinguished by the virtual name and
defines a number , speed, cpu's , memory availability etc.
Node configuration is a technique of creating logical C.P.U
The degree of parallellism of parallel jobs depends on the number of nodes
you define in your configuration file.Nodes are just the logically created
processes by the OS.
basically two types of nodes exist :
a) Conductor node : Datastage engine is loaded into conductor node.
b) processing nodes : One section leader is created per node.Section leaders
fork the player processes.
DataStage Important Interview Question And Answer
![Page 91: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/91.jpg)
In a Grid environment a node is the place where the jobs are executes.
Nodes are like processors , if we have more nodes when running the job , the
performance will be good to run parallel to make the job efficient.
A job is split into N sub-jobs which are served by each of the N servers. After
service, sub-job wait until all other sub-jobs have also been processed. The
sub-jobs are then rejoined and leave the system.
Actual data flows from player to player — the conductor and section leader
are only used to control process execution through control and message
channels.
* Conductor is the initial framework process. It creates the Section Leader (SL)
processes (one per node), consolidates messages to the DataStage log, and
manages orderly shutdown. The Conductor node has the start-up process.
The Conductor also communicates with the players.
* Section Leader is a process that forks player processes (one per stage) and
manages up/down communications. SLs communicate between the
conductor and player processes only. For a given parallel configuration file,
one section leader will be started for each logical node.
* Players are the actual processes associated with the stages. It sends stderr
and stdout to the SL, establishes connections to other players for data flow,
and cleans up on completion. Each player has to be able to communicate
with every other player. There are separate communication channels
(pathways) for control, errors, messages and data. The data channel does
not go through the section leader/conductor as this would limit scalability.
Data flows directly from upstream operator to downstream operator.
![Page 92: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/92.jpg)
Descriptor and Data files are the dataset files.
Descriptor file contains the Schema details and address of the data.
And Data file contains the data in the native format.
In DRS Stage we have a transaction Isolation , set to read committed .
And set Array Sze and transaction size to 10,2000 . So that , it will commit for
every 2000 records.
Iconv and Oconv functions are used to convert the date functions.
Iconv() is used to convert string to Internal storage format.
Oconv() is used to convert expression to an output format.
Performance of the Job is really important to maintain.Some of the
precautions are as follows to get good performance of the Jobs.Avoid the use
of only one flow of tuning for performance testing or tuning testing.Try to
work in Increment. Isolate and solve the Jobs. And Work in increment.
![Page 93: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/93.jpg)
For that
a) Avoid using Transformer stage where ever necessary. For example if you
are using Transformer stage to change the column names or to drop the
column names. Use Copy stage, rather than using Transformer stage. It will
give good performance to the Job.
b)Take care to take correct partitioning technique, according to the Job and
requirement.
c) Use User defined queries for extracting the data from databases .
d) If the data is less , use Sql Join statements rather then using a Lookup
stage.
e) If you have more number of stages in the Job, divide the job into multiple
jobs.
Server Jobs works only if the server jobs datastage has been installed in
your system. Server Jobs doesnot supports the parallelism and partition
techniques. Server Jobs generates basic programs after Job Compilation.
Parallel Jobs works, if you have installed Enterprise Edition. This works
on the Datastage Servers that are SMP (Symmetric Multi-Processing) , MPP (
Massively Parallel Processing ) etc. Parallel Jobs generates OSH ( Orchestrate
Shell ) Programs after job compilation. Different Stages will be like datasets,
lookup stages etc.
Server Jobs works in sequential way while parallel jobs work in parallel
fashion (Parallel Extender work on the principal of pipeline and partition) for
Input/Output processing.
![Page 94: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/94.jpg)
Difference between Datastage and Informatica is
Datastage is having Partition, Parallelism, Lookup , Merge etc
But Informtica Doesn't have this concept of partition and parallelism. File
lookup is really horrible
Compilation is the process of converting the GUI into its machine code .That
is nothing but machine understandable language.
In this process it will checks all the link requirements, stage mandatory
property values, and if there any logical errors.
And Compiler produces OSH Code.
Modeling is a Logical and physical representation of Source system.
Modeling have two types of Modeling Tools
They are
ERWIN AND ER-STUDIO
In Source System there will be a ER-Model and
in the Target system there will be a ER-Model and Dimensional Model
Dimension:- The table which was designed for the client perspective. We can
see in many ways in the Dimension tables.
![Page 95: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/95.jpg)
And there are two types of Models.
They are
Forward Engineering (F.E)
Reverse Engineering (R.E)
F.E:- F.E is the process starting the process from the scratch for banking
sector.
Ex: Any Bank which was required Datawarehouse.
R.E:- R.E is the process altering existing model for another bank.
A data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular community of knowledge
workers. In scope, the data may derive from an enterprise-wide database or
data warehouse or be more specialized. The emphasis of a data mart is on
meeting the specific demands of a particular group of knowledge users in
terms of analysis, content, presentation, and ease-of-use. Users of a data
mart can expect to have data presented in terms that are familiar.There are many reasons to create Datamart.There is lot of importance of
Datamart and advantages.
It is easy to access frequently needed data from the database when required
by the client.
We can give access to group of users to view the Datamart when it is
required. Ofcourse performance will be good.
It is easy to maintain and to create the datamart. It will be related to specific
business.
And It is low cost to create a datamart rather than creating datwarehouse
with a huge space.
![Page 96: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/96.jpg)
A data warehouse tends to be a strategic but somewhat unfinished
concept. The design of a data warehouse tends to start from an analysis of
what data already exists and how it can be collected in such a way that the
data can later be used. A data warehouse is a central aggregation of data
(which can be distributed physically);
A data mart tends to be tactical and aimed at meeting an immediate
need. The design of a data mart tends to start from an analysis of user needs.
A data mart is a data repository that may derive from a data warehouse or
not and that emphasizes ease of access and usability for a particular designed
purpose.
You may get many errors in datastage while compiling the jobs or running the
jobs.
Some of the errors are as follows
a)Source file not found. If you are trying to read the file, which was not there
with that name.
b)Some times you may get Fatal Errors.
c) Data type mismatches. This will occur when data type mismatches occurs
in the jobs.
d) Field Size errors.
e) Meta data Mismatch
f) Data type size between source and target different
g) Column Mismatch
i) Pricess time out. If server is busy. This error will come some time.
![Page 97: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/97.jpg)
In Datastage 7.5X2 Version, they are 4 client Components. They are
1) Datastage Designer
2) Datastage Director
3) Datastage Manager
4) Datastage Admin
In Datastage Designer, We
Create the Jobs
Compile the Jobs
Run the Jobs
In Director, We can
View the Jobs
View the Logs
Batch Jobs
Unlock Jobs
Scheduling Jobs
Monitor the JOBS
Message Handling
![Page 98: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/98.jpg)
1) In Datastage 7.5X2, there are 4 client components. They are
a) Datastage Design
b) Datastage Director
c Datastage Manager
d) Datastage Admin
And in
2) Datastage 8.0.1 Version, there are 5 components. They are
a) Datastage Design
b) Datastage Director
c) Datastage Admin
d) Web Console
e) Information Analyzer
Here Datastage Manager will be integrated with the Datastage Design option.
2) Datastage 7.X.2 Version is OS Dependent. That is OS users are Datastage
Users.
![Page 99: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/99.jpg)
Datastage is the product owned by I.B.M
Datastage is a ETL Tool an it is independent of platform.
Etl means Extraction , Transformation and loading the jobs.
Datastage is the product introduced by the company called V-mark with the
name
DataIntegrator in UK in the year 1997.
And later it was acquired by other companies. Finally it was reached to I.B.M
in 2006.
Datastage got parallel capabilities when it was integrated with the
Orchestrate file
and got independent platform capabilities when integrated with the MKS
Tool Kit
Datastage is a Comprehensive ETL Tool. It is used to Extract , transformation
and loading the Jobs. Datastage Project will be worked on the Datastage
don't. We can login to the Datastage Designer in order to enter the Datastage
too for datastage jobs, designing of the jobs etc.
Datastage jobs are maintained according to the project standards.
In every project we contain the Datastage Jobs , Built in Components , Table
Definitions , Repository and components required for the project.
![Page 100: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/100.jpg)
Hash and Modulus techniques are Key based partition techniques.
Hash and Modulus techniques are used for different purpose.
If Key column data type is textual then we use hash partition technique for
the job.
If Key column data type is numeric, we use modulus partition technique.
If one key column numeric and another text then also we use hash partition
technique.
if both the key columns are numeric data type then we use modulus partition
technique.
1)Any to Any
That means Datastage can Extrace the data from any source and can loads
the data into the any target.
2) Platform Independent
The Job developed in the one platform can run on the any other platform.
That means if we designed a job in the Uni level processing, it can be run in
the SMP machine.
3 )Node Configuration
Node Configuration is a technique to create logical C.P.U
Node is a Logical C.P.U
4)Partition Parallelism
Partition parallelim is a technique distributing the data across the nodes
based on the partition techniques. Partition Techniques are
a) Key based Techniques are
1 ) Hash 2)Modulus 3) Range 4) DB2
![Page 101: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/101.jpg)
And four phases are
1) Data Profiling
2) Data Quality
3) Data Transformation
4) Meta data management Data
Profiling:-
Data Profiling performs in 5 steps. Data Profiling will analysis weather the
source data is good or dirty or not.
And these 5 steps are
a) Column Analysis
b) Primary Key Analysis
c) Foreign Key Analysis
d) Cross domain Analysis
e) Base Line analysis
After completing the Analysis, if the data is good not a problem. If your data
is dirty, it will be sent for cleansing. This will be done in the second phase.
Data Quality:-
Data Quality, after getting the dirty data it will clean the data by using 5
RCP is nothing but Runtime Column Propagation. When we run the Datastage
Jobs, the columns may change from one stage to another stage. At that point
of time we will be loading the unnecessary columns in to the stage, which is
not required. If we want to load the required columns to load into the target,
we can do this by enabling a RCP. If we enable RCP, we can sent the required
columns into the target.
![Page 102: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/102.jpg)
Roles and Responsibilities of Software Engineer are
1) Preparing Questions
2) Logical Designs ( i.e Flow Chart )
3) Physical Designs ( i.e Coding )
4) Unit Testing
5) Performance Tuning.
6) Peer Review
7) Design Turnover Document or Detailed Design Document or Technical
design Document
8) Doing Backups
9) Job Sequencing ( It is for Senior Developer )
There are three Architecture Components in datastage 7.5x2
They are
Repository:--
Repository is an environment where we create job, design, compile and run
etc.
Some Components it contains are
JOBS,TABLE DEFINITIONS,SHARED CONTAINERS, ROUTINES ETC
Server( engine):-- Here it runs executable jobs that extract , transform,
and
load data into a datawarehouse.
Datastage Package Installer:--
It is a user interface used to install packaged datastage jobs and plugins.
![Page 103: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/103.jpg)
Group ids are created in two different ways. We can create group id's by
using
a) Key Change Column
b) Cluster Key change Column
Both of the options used to create group id's .
When we select any option and keep true. It will create the Group id's group
wise.
Data will be divided into the groups based on the key column and it will give
(1) for the first row of every group and (0) for rest of the rows in all groups.
Key change column and Cluster Key change column used, based on the data
we are getting from the source.
If the data we are getting is not sorted , then we use key change column to
create group id's
If the data we are getting is sorted data, then we use Cluster Key change
Column to create Group Id's .
The Entities in the Dimension which are change rapidly is
called Rapidly(fastly) changing dimention. best example is
atm machine transactions.
For parallel jobs there is also a force compile option. The compilation of
parallel jobs is by default optimized such that transformer stages only get
recompiled if they have changed since the last compilation. The force
compile option overrides this and causes all transformer stages in the job
to be compiled. To select this option:
• Choose File ➤ Force Compile
![Page 104: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/104.jpg)
10,000
When there is Memory limit is requirement is more, then go for Dataset, And
sequential file doesn’t support more than 2gb.
filter:1)we can write the multiple conditions on multiple
fields
2)it supports one inputlink and n number of outputlinks
Switch:1)multiple conditions on a single field(column)
2)it supports one inputlink and 128 output links
The major strength of the datastage are :
Partitioning,
pipelining,
Node configuration,
handles Huge volume of data,
Platform independent.
symmetric multiprocessing (SMP) involves a multiprocessor computer
hardware architecture where two or more identical processors are connected
to a single shared main memory and are controlled by a single OS instance.
Most common multiprocessor systems today use an SMP architecture.
![Page 105: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/105.jpg)
Data warehouse is made up of many datamarts. DWH contain many
subject areas. However, data mart focuses on one subject area generally. E.g.
If there will be DHW of bank then there can be one data mart for accounts,
one for Loans etc. This is high-level definitions.
A data mart (DM) is the access layer of the data warehouse (DW)
environment that is used to get data out to the users. The DM is a subset of
the DW, usually oriented to a specific business line or team.
System variables comprise of a set of variables which are used to get system
information and they can be accessed from a transformer or a routine. They
are read only and start with an @.
A sequencer allows you to synchronize the control flow of multiple activities
in a job sequence. It can have multiple input triggers as well as multiple
output triggers.
A dataware house is a decision support database for organisational needs.It is
subject oriented,non volatile,integrated ,time varient collect of data.
ODS(Operational Data Source) is a integrated collection of related
information . it contains maximum 90 days information.
ODS is nothing but operational data store is the part of transactional
database. this db keeps integrated data from different tdb and allow common
operations across organisation. eg: banking transaction.
In simple terms ODS is dynamic data.
Hash file stores the data based on hash algorithm and on a key value.
A sequential file is just a file with no key column.
Hash file used as a reference for look up.
Sequential file cannot.
![Page 106: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/106.jpg)
If you mean by Oracle Call Interface (OCI), it is a set of low-level APIs used to
interact with Oracle databases. It allows one to use operations like logon,
execute, parss etc. using a C or C++ program.
It uses GENERAL or SEQ.NUM. algorithm
In Lookup stage properties, you will have constraints option. If you click on
constraints button- you will get options like continue, drop, fail and reject
If you select the option continue: it means left outer join operation will be
performed.
If you select the option drop: it means inner join operation will be performed.
Datastage jobs,when compiled generate OSH.OSH is the abbreviation of
Orchestrate scripting language.When a datastage job is run,the generated
OSH is executed in the backend.
Orchestrate itself is an ETL tool with extensive parallel processing capabilities
and running on UNIX platform. Datastage used Orchestrate with Datastage XE
(Beta version of 6.0) to incorporate the parallel processing capabilities. Now
Datastage has purchased Orchestrate and integrated it with Datastage XE and
released a new version Datastage 6.0 i.e Parallel Extender.
In dimensional model,fact tables are dependenet on the dimension
tables.This means that fact table contains foreign keys to dimension
tables.This is the reason,dimension tables are loaded first and th...
![Page 107: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/107.jpg)
From command prompt batches can be created in following way :
a) create a batch file say RunbatchJobs.bat
b) Open this file in notepad.
c) Now,write the command "dsjob" with proper syntax for each job you want
to run.
d) in there are four jobs to be run in a batch,use dsjob command 4 times with
different job names on each line.
e) Save the file and close it.
f) Next time whenever you want to run the jobs,just click on the batch file
"RunbatchJobs.bat".All jobs will run one by one by the batch file.
Traditionally batch programs are created in the following way :
A Batch Program is used to run a batch of jobs by writing the Server routine
code in the job control section.To generate a batch program,do following:
a) Open datasatge director.
b) Go to Tools->Batch->New.
c) A new window will open with the "Job Control" tab selected.
d)Write the routine code and save it.You may run multiple jobs in batch by
making use of this.
Transformer stages compile in C++ whereas other stages compile into OSH
(Orchestarte scripting language.).If number of transfomers are more,first
thing is the compilation time will be impacted.it will take more time to
compile the Transformer stage.
Practically,transformer stage really does not have performance impact on DS
Jobs.
If in your jobs,the number of stages are more,the performance will be
impacted(not necessarily transformer stages).Hence,try to implement the job
logic by using minimum stages in your DS Jobs.
![Page 108: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/108.jpg)
NULL check
MetaData Check
Duplicate Check
Invalid Check
Profile stage: is a profiling tool to investigate data sources to see inherent
structures, frequencies of phrases, identify datatypes, etc.. In addition it can,
based on the real data, rather than metadata, suggest a data model for the
union of your data sources. This datamodel would be in 3 NF.
Quality Stage: is now embedded in Information Server and provides
functionality for fuzzy matching records and for standardizing record fields
based on predefined rules.
Audit stage: is now a part of Information Analyzer. This part of IA can, based
on predefined rules, expose exceptions in your data from the required
format, contents and relationships.
This error occurs when Oracle Stage try to fetch a column like 34.55676776...
and actually it's data type is decimal(10,2)The solution here is to either
truncate or Round the data till 2 decimal positions
Hash file is datastage internal file. Data will be stored on computer memory..
and works on key column. So retrieval will be faster when compare to hitting
the database.
![Page 109: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/109.jpg)
Develop a job source seq file--> Transformer--> output stage
In the transformer write a stage variable as rowcount with the following
derivation
Goto DSfunctions click on DSGetLinkInfo..
you will get "DSGetLinkInfo(DSJ.ME,%Arg2%,%Arg3%,%Arg4%)"
Arg 2 is your source stage name
Arg 3 is your source link name
Arg 4 --> Click DS Constant and select DSJ.LINKROWCOUNT.
Now ur derivation is
"DSGetLinkInfo(DSJ.ME,"source","link", DSJ.LINKROWCOUNT)"
Create a constraint as @INROWNUM =rowcount
and map the required column to output link.
Project life cycle is related to SDLC
that is software development life cycle....which mean there are 4 stages
involved
that is
1)Analysis
2)development
3)Testing
4)Implementation
This covers the entire project life cycle !
Jobcontrol can be done using :
Datastage job Sequencers
Datastage Custom routines
Scripting
Scheduling tools like Autosys
![Page 110: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/110.jpg)
No chance ..... u have to kill the job process
U can also do it by using data stage director clean up resources
Unit Testing:
In Datastage senario Unit Testing is the technique of testing the
individual Datastage jobs for its functionality.
Integrating Testing:
When the two or more jobs are collectively tested for its
functionality that is callled Integrating testing.
REmove log files periodically..... And by using command CLEAR.FILE &PH&
If we want to send more data from the source to the targets quickly we will
be using the link partioner stage in the server jobs we can make a maximum
of 64 partitions. And this will be in active stage. We can't connect two active
stages but it is accpeted only for this stage to connect to the transformer or
aggregator stage. The data sent from the link partioner will be collected by
the link collector at a max of 64 partition. This is also an active stage so in
order to aviod the connection of active stage from the transformer to teh link
collector we will be using inter process communication. As this is a passive
stage by using this data can be collected by the link collector. But we can use
inter process communication only when the target is in passive stage
![Page 111: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/111.jpg)
Transaction Size - This field exists for backward compatibility, but it is ignored
for release 3.0 and later of the Plug-in. The transaction size for new jobs is
now handled by Rows per transaction on the Transaction Handling tab on the
Input page.
Rows per transaction - The number of rows written before a commit is
executed for the transaction. The default value is 0, that is, all the rows are
written before being committed to the data table.
Array Size - The number of rows written to or read from the database at a
time. The default value is 1, that is, each row is written in a separate
statement.
1. If u want to know some job is a part of a sequence, then in the Manager
right click the job and select Usage Analysis. It will show all the jobs
dependents.
2. To find how many jobs are using a particular table.
3. To find how many jobs are usinga particular routine.
Like this, u can find all the dependents of a particular object.
Its like nested. U can move forward and backward and can see all the
dependents.
![Page 112: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/112.jpg)
SQL SELECT DISTINCT
SQL AND & OR Operators
SQL ORDER BY
SQL UPDATE
SQL
![Page 113: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/113.jpg)
SQL DELETE
SQL SUBQUERY
SQL CASE
![Page 114: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/114.jpg)
SQL TOP
SQL LIKE
SQL IN
SQL BETWEEN
![Page 115: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/115.jpg)
SQL Alias
SQL Joins
SQL INNER JOIN
SQL LEFT JOIN
![Page 116: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/116.jpg)
SQL RIGHT JOIN
SQL FULL JOIN
SQL UNION
SQL INTERSECT
![Page 117: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/117.jpg)
SQL MINUS
SQL LIMIT
SQL CREATE DATABASE
SQL CREATE TABLE
![Page 118: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/118.jpg)
SQL Constraints
SQL NOT NULL
SQL UNIQUE
![Page 119: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/119.jpg)
SQL PRIMARY KEY
![Page 120: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/120.jpg)
SQL FOREIGN KEY
SQL CHECK
![Page 121: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/121.jpg)
SQL DEFAULT
![Page 122: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/122.jpg)
SQL CREATE INDEX
SQL ALTER TABLE
SQL AUTO INCREMENT
![Page 123: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/123.jpg)
SQL Views
SQL Date Functions
![Page 124: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/124.jpg)
SQL NULL Values
SQL ISNULL VALUES
![Page 125: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/125.jpg)
SQL COALESCE FUNCTION
SQL IFNULL VALUES
![Page 126: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/126.jpg)
SQL NVL Function
SQL NULLIF FUNCTION
SQL RANK FUNCTION
![Page 127: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/127.jpg)
SQL RUNNINNG TOTAL
SQL PERCENT TOTAL
SQL CUMULATIVE PERCENT TOTAL
![Page 128: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/128.jpg)
SQL Functions
SQL AVG() Function
SQL COUNT() Function
![Page 129: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/129.jpg)
SQL FIRST() Function
SQL MAX() Function
SQL MIN() Function
![Page 130: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/130.jpg)
SQL SUM() Function
SQL GROUP BY Statement
SQL HAVING Clause
![Page 131: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/131.jpg)
SQL Upper() Function/UCASE
SQL lower() Function/LCASE
SQL MID() Function
SQL LENGTH() Function
SQL ROUND() Function
![Page 132: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/132.jpg)
SQL NOW() Function
Concatenate Function
Substring Function
STRING FUNCTION
![Page 133: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/133.jpg)
INSTR Function
Trim Function
![Page 134: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/134.jpg)
Length Function
Replace Function
DATEADD FUNCTION
DATEDIFF FUNCTION
DATEPART FUNCTION
DATE FUNCTION (SQL SERVER)
![Page 135: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/135.jpg)
GETDATE FUNCTION
SYSDATE FUNCTION
![Page 136: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/136.jpg)
In a table, some of the columns may contain duplicate values. This is not a
problem, however, sometimes you will want to list only the different (distinct)
values in a table.
The DISTINCT keyword can be used to return only distinct (different) values.
SELECT DISTINCT column_name(s)
FROM table_name
The AND operator displays a record if both the first condition and the second
condition is true.
The OR operator displays a record if either the first condition or the second
condition is true. AND
SELECT * FROM Persons
WHERE FirstName='Tove'
AND LastName='Svendson'
OR
SELECT * FROM Persons
WHERE FirstName='Tove'
OR FirstName='Ola'
The ORDER BY keyword is used to sort the result-set by a specified column.
The ORDER BY keyword sort the records in ascending order by default.
If you want to sort the records in a descending order, you can use the DESC
keyword.
SQL ORDER BY Syntax
SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC
The UPDATE statement is used to update records in a table.
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
SQL
![Page 137: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/137.jpg)
The DELETE statement is used to delete records in a table.
DELETE FROM table_name
WHERE some_column=some_value
It is possible to embed a SQL statement within another. When this is done on the
WHERE or the HAVING statements, we have a subquery construct.
The syntax is as follows:
SELECT "column_name1"
FROM "table_name1"
WHERE "column_name2" [Comparison Operator]
(SELECT "column_name3"
FROM "table_name2"
WHERE [Condition])
Case is used to provide if-then-else type of logic to SQL. Its syntax is:
SELECT CASE ("column_name")
WHEN "condition1" THEN "result1"
WHEN "condition2" THEN "result2"
...
[ELSE "resultN"]
END
FROM "table_name"
"condition" can be a static value or an expression. The ELSE clause is optional.
Example :- SELECT store_name, CASE store_name
WHEN 'Los Angeles' THEN Sales * 2
WHEN 'San Diego' THEN Sales * 1.5
ELSE Sales
END
"New Sales",
Date
FROM Store_Information
![Page 138: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/138.jpg)
The TOP clause is used to specify the number of records to return.
SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number
The LIKE operator is used in a WHERE clause to search for a specified pattern in a
column. Start searchin from first
character 's'
SELECT * FROM Persons
WHERE City LIKE 's%'
Start searchin from last character 's'
SELECT * FROM Persons
WHERE City LIKE '%s'
Start searching which not contain 'tav' SELECT * FROM Persons
WHERE City NOT LIKE '%tav%'
The IN operator allows you to specify multiple values in a WHERE clause.
SQL IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)
The BETWEEN operator is used in a WHERE clause to select a range of data
between two values.
SQL BETWEEN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
![Page 139: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/139.jpg)
With SQL, an alias name can be given to a table or to a column.
SQL Alias Syntax for Tables
SELECT column_name(s)
FROM table_name
AS alias_name
SQL Alias Syntax for Columns
SELECT column_name AS alias_name
FROM table_name
SQL joins are used to query data from two or more tables, based on a
relationship between certain columns in these tables.
The INNER JOIN keyword return rows when there is at least one match in both
tables.
SQL INNER JOIN Syntax
SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
The LEFT JOIN keyword returns all rows from the left table (table_name1), even
if there are no matches in the right table (table_name2).
SQL LEFT JOIN Syntax
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
![Page 140: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/140.jpg)
The RIGHT JOIN keyword returns all the rows from the right table (table_name2),
even if there are no matches in the left table (table_name1).
SQL RIGHT JOIN Syntax
SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
The FULL JOIN keyword return rows when there is a match in one of the tables.
SQL FULL JOIN Syntax
SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name
The UNION operator is used to combine the result-set of two or more SELECT
statements.
Notice that each SELECT statement within the UNION must have the same
number of columns. The columns must also have similar data types. Also, the
columns in each SELECT statement must be in the same order.
SQL UNION Syntax
SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
Similar to the UNION command, INTERSECT also operates on two SQL
statements. The difference is that, while UNION essentially acts as an OR
operator (value is selected if it appears in either the first or the second
statement), the INTERSECT command acts as an AND operator (value is selected
only if it appears in both statements).
The syntax is as follows:
[SQL Statement 1]
INTERSECT
[SQL Statement 2]
![Page 141: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/141.jpg)
The MINUS operates on two SQL statements. It takes all the results from the first
SQL statement, and then subtract out the ones that are present in the second
SQL statement to get the final answer. If the second SQL statement includes
results not present in the first SQL statement, such results are ignored.
The syntax is as follows:
[SQL Statement 1]
MINUS
[SQL Statement 2]
we may not want to retrieve all the records that satsify the critera specified in
WHERE or HAVING clauses.
In MySQL, this is accomplished using the LIMIT keyword. The syntax for LIMIT is
as follows:
[SQL Statement 1]
LIMIT [N]
The CREATE DATABASE statement is used to create a database.
SQL CREATE DATABASE Syntax
CREATE DATABASE database_name
The CREATE TABLE statement is used to create a table in a database.
SQL CREATE TABLE Syntax
CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
column_name3 data_type,
....
)
![Page 142: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/142.jpg)
Constraints are used to limit the type of data that can go into a table.
Constraints can be specified when a table is created (with the CREATE TABLE
statement) or after the table is created (with the ALTER TABLE statement).
We will focus on the following constraints:
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK
DEFAULT
The NOT NULL constraint enforces a column to NOT accept NULL values.
CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
The UNIQUE constraint uniquely identifies each record in a database table.
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for
uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note: that you can have many UNIQUE constraints per table, but only one
PRIMARY KEY constraint per table.
![Page 143: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/143.jpg)
CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
)
SQL UNIQUE Constraint on ALTER TABLE
ALTER TABLE Persons
ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
To DROP a UNIQUE Constraint
ALTER TABLE Persons
DROP CONSTRAINT uc_PersonID
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values.
A primary key column cannot contain NULL values.
Each table should have a primary key, and each table can have only ONE primary
key.
CREATE TABLE Persons
(
P_Id int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
SQL PRIMARY KEY Constraint on ALTER TABLE
ALTER TABLE Persons
ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
To DROP a PRIMARY KEY Constraint
ALTER TABLE Persons
DROP CONSTRAINT pk_PersonID
![Page 144: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/144.jpg)
A FOREIGN KEY in one table points to a PRIMARY KEY in another table.
CREATE TABLE Orders
(
O_Id int NOT NULL PRIMARY KEY,
OrderNo int NOT NULL,
P_Id int FOREIGN KEY REFERENCES Persons(P_Id)
)
SQL FOREIGN KEY Constraint on ALTER TABLE
To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders"
table is already created, use the following SQL:
ALTER TABLE Orders
ADD CONSTRAINT fk_PerOrders
FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
To DROP a FOREIGN KEY Constraint
ALTER TABLE Orders
DROP CONSTRAINT fk_PerOrders
The CHECK constraint is used to limit the value range that can be placed in a
column.
If you define a CHECK constraint on a single column it allows only certain values
for this column.
If you define a CHECK constraint on a table it can limit the values in certain
columns based on values in other columns in the row.
CREATE TABLE Persons
(
P_Id int NOT NULL CHECK (P_Id>0),
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
![Page 145: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/145.jpg)
SQL CHECK Constraint on ALTER TABLE
To create a CHECK constraint on the "P_Id" column when the table is already
created, use the following SQL:
ALTER TABLE Persons
ADD CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')
To DROP a CHECK Constraint
To drop a CHECK constraint, use the following SQL:
SQL Server / Oracle / MS Access:
ALTER TABLE Persons
DROP CONSTRAINT chk_Person
The DEFAULT constraint is used to insert a default value into a column.
The default value will be added to all new records, if no other value is specified.
CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) DEFAULT 'Sandnes'
)
SQL DEFAULT Constraint on ALTER TABLE
ALTER TABLE Persons
ALTER COLUMN City SET DEFAULT 'SANDNES'
To DROP a DEFAULT Constraint
ALTER TABLE Persons
ALTER COLUMN City DROP DEFAULT
![Page 146: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/146.jpg)
An index can be created in a table to find data more quickly and efficiently.
The users cannot see the indexes, they are just used to speed up
searches/queries.
Note: Updating a table with indexes takes more time than updating a table
without (because the indexes also need an update). So you should only create
indexes on columns (and tables) that will be frequently searched against.
SQL CREATE INDEX Syntax
Creates an index on a table. Duplicate values are allowed:
CREATE INDEX index_name
ON table_name (column_name)
SQL CREATE UNIQUE INDEX Syntax
Creates a unique index on a table. Duplicate values are not allowed:
CREATE UNIQUE INDEX index_name
ON table_name (column_name)
The ALTER TABLE statement is used to add, delete, or modify columns in an
existing table.
SQL ALTER TABLE Syntax
To add a column in a table, use the following syntax:
ALTER TABLE table_name
ADD column_name datatype
Very often we would like the value of the primary key field to be created
automatically every time a new record is inserted.
We would like to create an auto-increment field in a table.
Use the following CREATE SEQUENCE syntax:
CREATE SEQUENCE seq_person
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10
![Page 147: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/147.jpg)
In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are
fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present
the data as if the data were coming from one single table.
SQL CREATE VIEW Syntax
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
SQL Updating a View
You can update a view by using the following syntax:
SQL CREATE OR REPLACE VIEW Syntax
CREATE OR REPLACE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
SQL Dropping a View
You can delete a view with the DROP VIEW command.
SQL DROP VIEW Syntax
DROP VIEW view_name
The most difficult part when working with dates is to be sure that the format of
the date you are trying to insert, matches the format of the date column in the
database.
SQL Server comes with the following data types for storing a date or a date/time
value in the database:
DATE - format YYYY-MM-DD
DATETIME - format: YYYY-MM-DD HH:MM:SS
SMALLDATETIME - format: YYYY-MM-DD HH:MM:SS
TIMESTAMP - format: a unique number
![Page 148: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/148.jpg)
NULL values represent missing unknown data.
By default, a table column can hold NULL values.
NULL means that data does not exist. NULL does not equal to 0 or an empty
string. Both 0 and empty string represent a value, while NULL has no value.
Any mathematical operations performed on NULL will result in NULL. For
example,
10 + NULL = NULL
SQL IS NULL
How do we select only the records with NULL values in the "Address" column?
We will have to use the IS NULL operator:
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NULL
SQL IS NOT NULL
How do we select only the records with no NULL values in the "Address"
column?
We will have to use the IS NOT NULL operator:
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NOT NULL
In SQL Server, the ISNULL() function is used to replace NULL value with another
value.
For example, if we have the following table,
Table Sales_Data
store_name, Sales
Store A, 300
Store B, NULL
EXAMPLE :-SELECT SUM(ISNULL(Sales,100)) FROM Sales_Data;
![Page 149: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/149.jpg)
COALESCE function in SQL returns the first non-NULL expression among its
arguments.It is the same as the following CASE statement:
SELECT CASE ("column_name")
WHEN "expression 1 is not NULL" THEN "expression 1"
WHEN "expression 2 is not NULL" THEN "expression 2"
...
[ELSE "NULL"]
END
FROM "table_name"
EXAMPLE :-SELECT Name, COALESCE(Business_Phone, Cell_Phone,
Home_Phone) Contact_Phone
FROM Contact_Info;
This function takes two arguments. If the first argument is not NULL, the function
returns the first argument. Otherwise, the second argument is returned. This
function is commonly used to replace NULL value with another value. It is similar
to the NVL function in Oracle and the ISNULL Function in SQL Server.
For example, if we have the following table,
Table Sales_Data
store_name Sales
Store A 300
Store B NULL
EXAMPLE :- SELECT SUM(IFNULL(Sales,100)) FROM Sales_Data;
returns 400. This is because NULL has been replaced by 100 via the ISNULL
function.
![Page 150: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/150.jpg)
Is available in Oracle, and not in MySQL or SQL Server. This function is used to
replace NULL value with another value. It is similar to the IFNULL Function in
MySQL and the ISNULL Function in SQL Server.
For example, if we have the following table,
Table Sales_Data
store_name Sales
Store A 300
Store B NULL
Store C 150
EXAMPLE :- SELECT SUM(NVL(Sales,100)) FROM Sales_Data;
returns 550. This is because NULL has been replaced by 100 via the ISNULL
function, hence the sum of the 3 rows is 300 + 100 + 150 = 550.
function takes two arguments. If the two arguments are equal, then NULL is
returned. Otherwise, the first argument is returned.
It is the same as the following CASE statement:
SELECT CASE ("column_name")
WHEN "expression 1 = expression 2 " THEN "NULL"
[ELSE "expression 1"]
END
FROM "table_name"
EXAMPLE :- SELECT Store_name, NULLIF(Actual,Goal) FROM Sales_Data;
The rank associated with each row is a common request, and there is no
straightforward way to do so in SQL. To display rank in SQL, the idea is to do a
self-join, list out the results in order, and do a count on the number of records
that's listed ahead of (and including) the record of interest. Let's use an example
to illustrate. Say we have the following table,
EXAMPLE :- SELECT a1.Name, a1.Sales, COUNT(a2.sales) Sales_Rank
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.Sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;
![Page 151: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/151.jpg)
running totals is a common request, and there is no straightforward way to do so
in SQL. The idea for using SQL to display running totals similar to that for
displaying rank: first do a self-join, then, list out the results in order. Where as
finding the rank requires doing a count on the number of records that's listed
ahead of (and including) the record of interest, finding the running total requires
summing the values for the records that's listed ahead of (and including) the
record of interest.
EXAMPLE :- SELECT a1.Name, a1.Sales, SUM(a2.Sales) Running_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;
Percent to total in SQL, we want to leverage the ideas we used for rank/running
total plus subquery. Different from what we saw in the SQL Subquery section,
here we want to use the subquery as part of the SELECT.
EXAMPLE :- SELECT a1.Name, a1.Sales, a1.Sales/(SELECT SUM(Sales) FROM
Total_Sales) Pct_To_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;
cumulative percent to total in SQL, we use the same idea as we saw in the
Percent To Total section. The difference is that we want the cumulative percent
to total, not the percentage contribution of each individual row. EXAMPLE :-
SELECT a1.Name, a1.Sales, SUM(a2.Sales)/(SELECT SUM(Sales) FROM
Total_Sales) Pct_To_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;
![Page 152: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/152.jpg)
SQL Aggregate Functions
SQL aggregate functions return a single value, calculated from values in a
column.
Useful aggregate functions:
AVG() - Returns the average value
COUNT() - Returns the number of rows
FIRST() - Returns the first value
LAST() - Returns the last value
MAX() - Returns the largest value
MIN() - Returns the smallest value
SUM() - Returns the sum
SQL Scalar functions
SQL scalar functions return a single value, based on the input value.
Useful scalar functions:
UCASE() - Converts a field to upper case
LCASE() - Converts a field to lower case
MID() - Extract characters from a text field
LEN() - Returns the length of a text field
ROUND() - Rounds a numeric field to the number of decimals specified
NOW() - Returns the current system date and time
FORMAT() - Formats how a field is to be displayed
The AVG() Function
The AVG() function returns the average value of a numeric column.
SELECT AVG(column_name) as (Alias_column_name)FROM table_name
Now we want to find the customers that have an OrderPrice value higher than
the average OrderPrice value.
We use the following SQL statement:
SELECT Customer FROM Orders
WHERE OrderPrice>(SELECT AVG(OrderPrice) FROM Orders)
The COUNT() function returns the number of rows that matches a specified
criteria.
SQL COUNT(column_name) Syntax
![Page 153: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/153.jpg)
SQL COUNT(*) Syntax
The COUNT(*) function returns the number of records in a table:
SELECT COUNT(*) FROM table_name
SQL COUNT(DISTINCT column_name) Syntax
The COUNT(DISTINCT column_name) function returns the number of distinct
values of the specified column:
SELECT COUNT(DISTINCT column_name) FROM table_name
The FIRST() function returns the first value of the selected column.
SQL FIRST() Syntax
SELECT FIRST(OrderPrice) AS FirstOrderPrice FROM Orders
The MAX() Function
The MAX() function returns the largest value of the selected column.
SQL MAX() Syntax
SELECT MAX(column_name) as (Alias_Column_name) FROM table_name
The MIN() Function
The MIN() function returns the smallest value of the selected column.
SQL MIN() Syntax
SELECT MIN(column_name) as (Alias_Column_name) FROM table_name
![Page 154: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/154.jpg)
The SUM() Function
The SUM() function returns the total sum of a numeric column.
SQL SUM() Syntax
SELECT SUM(column_name) as (Alias_Column_name) FROM table_name
The GROUP BY Statement
The GROUP BY statement is used in conjunction with the aggregate functions to
group the result-set by one or more columns.
SQL GROUP BY Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
The HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword could not be
used with aggregate functions.
SQL HAVING Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
![Page 155: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/155.jpg)
The Upper() function converts the value of a field to uppercase.
Syntax for SQL Server
SELECT UPPER(column_name) FROM table_name
The lower() function converts the value of a field to uppercase.
Syntax for SQL Server
SELECT lower(column_name) FROM table_name
The MID() function is used to extract characters from a text field.
SQL MID() Syntax
SELECT MID(column_name,start[,length]) FROM table_name
Example
SELECT MID(City,1,4) as SmallCity FROM Persons
The LENGTH() Function
The LENGTH() function returns the length of the value in a text field.
SQL LENGTH() Syntax
SELECT LENGTH(column_name) FROM table_name
The ROUND() Function
The ROUND() function is used to round a numeric field to the number of
decimals specified.
SQL ROUND() Syntax
SELECT ROUND(column_name,decimals) FROM table_name
![Page 156: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/156.jpg)
it is necessary to combine together (concatenate) the results from several
different fields. Each database provides a way to do this:
MySQL: CONCAT()
Oracle: CONCAT(), ||
SQL Server: +
Example :- MySQL/Oracle:
SELECT CONCAT(Column1,Column2) FROM Geography
WHERE Column2 = 'Boston';
Oracle:
SELECT Column1 || ' ' || Column2 FROM Geography
WHERE Column2 = 'Boston';
SQL Server:
SELECT Column1 + ' ' + Column2 FROM Geography
WHERE Column2 = 'Boston';
is used to grab a portion of the stored data. This function is called differently for
the different databases:
MySQL: SUBSTR(), SUBSTRING()
Oracle: SUBSTR()
SQL Server: SUBSTRING()
Example 1 :- SELECT SUBSTR(store_name, 3)
FROM Geography
WHERE store_name = 'Los Angeles';
Example 2 :- SELECT SUBSTR(store_name,2,4)
FROM Geography
WHERE store_name = 'San Diego';
STRING FUNCTION
![Page 157: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/157.jpg)
is used to find the starting location of a pattern in a string. This function is
available in MySQL and Oracle, though they have slightly different syntaxes:
The syntax for the Length function is as follows:
MySQL: INSTR (str, pattern): Find the staring location of pattern in string str.
Oracle: INSTR (str, pattern, [starting position, [nth location]]):
Example 1 :-SELECT INSTR(store_name,'o')
FROM Geography
WHERE store_name = 'Los Angeles';
Example 2 :- SELECT INSTR(store_name,'p')
FROM Geography
WHERE store_name = 'Los Angeles';
Examle 3 :- SELECT INSTR(store_name,'e', 1, 2)
FROM Geography
WHERE store_name = 'Los Angeles';
s used to remove specified prefix or suffix from a string. The most common
pattern being removed is white spaces. This function is called differently in
different databases:
MySQL: TRIM(), RTRIM(), LTRIM()
Oracle: RTRIM(), LTRIM()
SQL Server: RTRIM(), LTRIM()
Example 1 :- SELECT TRIM(' Sample ');
Example 2 :- SELECT LTRIM(' Sample ');
Example 3 :- Select RTIM(' Sample ');
![Page 158: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/158.jpg)
is used to get the length of a string. This function is called differently for the
different databases:
MySQL: LENGTH()
Oracle: LENGTH()
SQL Server: LEN()
Example 1 :- SELECT Length(store_name)
FROM Geography
WHERE store_name = 'Los Angeles';
Example 2 :- SELECT region_name, Length(region_name)
FROM Geography;
is used to update the content of a string. The function call is REPLACE() for
MySQL, Oracle, and SQL Server. The syntax of the Replace function is
Syntax : -
Replace(str1, str2, str3): In str1, find where str2 occurs, and replace it with str3.
Example :- SELECT REPLACE(region_name, 'ast', 'astern')
FROM Geography;
is used to add an interval to a date. This function is available in SQL Server.
The usage for the DATEADD function is
DATEADD (datepart, number, expression)
Example :- SELECT DATEADD(day, 10,'2000-01-05 00:05:00.000');
is used to calculate the difference between two days, and is used in MySQL and
SQL Server.
Example :- SELECT DATEDIFF(day, '2000-01-10','2000-01-05');
Is a SQL Server function that extracts a specific part of the date/time value. Its
syntax is as follows:
DATEPART (part_of_day, expression)
DATE FUNCTION (SQL SERVER)
![Page 159: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/159.jpg)
Example :- SELECT DATEPART (yyyy,'2000-01-20');
Example :- SELECT DATEPART(dy, '2000-02-10');
Is used to retrieve the current database system time in SQL Server. Its syntax is
GETDATE()
Example :- SELECT DATEPART (yyyy,'2000-01-20');
is used to retrieve the current database system time in Oracle and MySQL.
Example :- SELECT SYSDATE FROM DUAL;
![Page 160: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/160.jpg)
Installation log files
Troubleshooting
![Page 161: 86043838 Datastage Interview](https://reader036.vdocuments.net/reader036/viewer/2022081801/553fd727550346d66e8b495a/html5/thumbnails/161.jpg)
%TEMP%\ibm_is_logs
Troubleshooting