idq functionality

10
6 Dimensions of Data Quality Overview: To be successful in business, you need to make decisions fast and based on the right information. While a business intelligence system makes it much simpler to analyze and report on the data loaded into a data warehouse system, the existence of data alone does not ensure that executives make decisions smoothly; the quality of the data is equally as important. Consider a high-level meeting to review company performance: if you learn that two reports compiled from supposedly the same set of data reflect two different revenue figures, no one can know which figures are accurate, which could cause important decisions to be postponed while the “truth” is investigated. One of the causes of data quality issues is in source data that data sources can have scattered or misplaced values, outdated and duplicate records, and inconsistent (or undefined) data standards and formats across customers, products, transactions, financials and more. But perhaps the largest contributor to data quality issues is that the data are being entered, edited, maintained, manipulated and reported on by people. On the surface, it is obvious that data quality is about cleaning up bad data data that are missing, incorrect or invalid in some way. But in order to ensure data are trustworthy, it is important to understand the key dimensions of data quality to assess how the data are “bad” in the first place.

Upload: jeevan-reddy-pareddy

Post on 11-Jul-2016

66 views

Category:

Documents


8 download

DESCRIPTION

IDQ

TRANSCRIPT

Page 1: IDQ Functionality

6 Dimensions of Data Quality

Overview:

To be successful in business, you need to make decisions fast and based on the right information.

While a business intelligence system makes it much simpler to analyze and report on the data loaded into a data warehouse system, the existence of data alone does not ensure that executives make decisions smoothly; the quality of the data is equally as important.

Consider a high-level meeting to review company performance: if you learn that two reports compiled from supposedly the same set of data reflect two different revenue figures, no one can know which figures are accurate, which could cause important decisions to be postponed while the “truth” is investigated.

One of the causes of data quality issues is in source data that data sources can have scattered or misplaced values, outdated and duplicate records, and inconsistent (or undefined) data standards and formats across customers, products, transactions, financials and more. But perhaps the largest contributor to data quality issues is that the data are being entered, edited, maintained, manipulated and reported on by people.

On the surface, it is obvious that data quality is about cleaning up bad data – data that are missing, incorrect or invalid in some way. But in order to ensure data are trustworthy, it is important to understand the key dimensions of data quality to assess how the data are “bad” in the first place.

Page 2: IDQ Functionality
Page 3: IDQ Functionality

Completeness: Completeness is defined as expected comprehensiveness. Data can be complete even if optional data is missing. As long as the data meets the expectations then the data is considered complete.

For example, a customer’s first name and last name are mandatory but middle name is optional; so a record can be considered complete even if a middle name is not available.

Questions you can ask yourself: Is all the requisite information available? Do any data values have missing elements? Or are they in an unusable state?

OR

Completeness: Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue.

Consistency: Consistency means data across all systems reflects the same information and are in synch with each other across the enterprise. Examples:

A business unit status is closed but there are sales for that business unit.

Employee status is terminated but pay status is active.

Page 4: IDQ Functionality

Questions you can ask yourself: Are data values the same across the data sets? Are there any distinct occurrences of the same data instances that provide conflicting information?

Conformity: Conformity means the data is following the set of standard data definitions like data type, size and format. For example, date of birth of customer is in the format “mm/dd/yyyy”

Questions you can ask yourself: Do data values comply with the specified formats? If so, do all the data values comply with those formats?Maintaining conformance to specific formats is important.

Accuracy: Accuracy is the degree to which data correctly reflects the real world object OR an event being described. Examples:

Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications.

Sales of the business unit are the real value.

Address of an employee in the employee database is the real address.

Questions you can ask yourself: Do data objects accurately represent the “real world” values they are expected to model? Are there incorrect spellings of product or person names, addresses, and even untimely or not current data? These issues can impact operational and analytical applications.

Duplication: Are there multiple, unnecessary representations of the same data objects within your data set? The inability to maintain a single representation for each entity across your systems poses numerous vulnerabilities and risks.

Integrity: Integrity means validity of data across the relationships and ensures that all data in a database can be traced and connected to other data.

For example, in a customer database, there should be a valid customer, addresses and relationship between them. If there is an address relationship data without a customer then that data is not valid and is considered an orphaned record.

Ask yourself: Is there are any data missing important relationship linkages?The inability to link related records together may actually introduce duplication across your systems.

Timeliness: Timeliness references whether information is available when it is expected and needed. Timeliness of data is very important. This is reflected in:

Companies that are required to publish their quarterly results within a given frame of time

Customer service providing up-to date information to the customers

Credit system checking in real-time on the credit card account activity

The timeliness depends on user expectation. Online availability of data could be required for room allocation system in hospitality, but nightly data could be perfectly acceptable for a billing system.

-----------------------------------------------------------------------

Page 5: IDQ Functionality

IDQ Functionality:

IDQ-Informatica Data Quality is used for Data Quality Analysis perspective provides subset of data, attributes details which gives what is wrong and what is being used.

It has capability to generate various levels of reports/graphs based on data. Used for understanding, acting and reporting.

Use the IDQ to design and run processes to complete the following tasks:

Profile data: Profiling reveals the content and structure of data.

Profiling is a key step in any data project, as it can identify strengths and weaknesses in data and help you define a project plan.

Create scorecards to review data quality: A scorecard is a graphical representation of the quality measurements in a profile.

Standardize data values: Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.

Parse data: Parsing reads a field composed of multiple values and creates a field for each value according to the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data.

Validate postal addresses: Address validation evaluates and enhances the accuracy and deliverability of postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.

Find duplicate records: Duplicate analysis calculates the degrees of similarity between records by comparing data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data.

Create reference data tables: Informatica provides reference data that can enhance several types of data quality process, including standardization and parsing. You can create reference tables using data from profile results.

Create and run data quality rules: Informatica provides rules that you can run or edit to meet your project objectives. You can create mapplets and validate them as rules in the Developer tool. Collaborate with Informatica users: The Model repository stores reference data and rules, and this repository is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.

Page 6: IDQ Functionality

Export mappings to Power Center: You can export mappings to Power Center to reuse the metadata for physical data integration or to create web services.

PROFILING:

------------------------------------------------------------------------------------------------------------------------------------------

IDQ Questions:

IDQ Pros and Cons

https://www.trustradius.com/products/informatica-data-quality/reviews

www.tekclasses.in for Demo

What are the most used transformations in IDQ?

What is address doctor?

Can we export an object from IDQ to Power center tool? If yes then how?

What is a reference table?

In IDQ, is possible to create user defined reference tables? In what circumstances can they be required?

What is a parser transformation?

What is the functionality of labeler transformation?

How to export all object profile at once?

Does IDQ have an emailing system like power Center?

How can we publish IDQ SSR results on the Intranet/Web?

What type of IDQ plans can be exported as mapplets to Power center?

How to add in one time many physical data object

Is there a way we can parameterize Notifications Recipients list in Exception task inside a Human Task?

Can we use Oracle tables as reference tables in IDQ?

What is address doctor?

Page 7: IDQ Functionality

How to check in Informatica Data Quality which fields in range are unique

What is the algorithm of tools such as Data Flux or Informatica to remove duplicates?

How do you include you mapping output variables in the workflow without having to use a human task?

----------------------------9999999999-----------------------------

Persistent cache in look up

Bulk vs normal load

Constraint base load ordering

Page 8: IDQ Functionality

---------------------------------------------------------------------------------------------------------------