ihr logo data explorer - a data profiling tool. your logo agenda introduction existing system ...
Post on 26-Dec-2015
215 Views
Preview:
TRANSCRIPT
Ihr Logo
Data Explorer- A data profiling tool
Your Logo
Agenda Introduction
Existing System
Limitations of Existing System
Proposed Solution
Project Scope
Block Diagram
Implementation
Technology
Hardware and Software Requirements
Features and Benefits
Future Enhancement
Data Explorer – A Data Profiling Tool
Your Logo
Introduction (1/2)
Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data.
Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the structure, content, relationships and derivation rules of the data. Profiling helps to understand anomalies and to assess data quality, but also to discover, register, and assess enterprise metadata.
The purpose of data profiling is both to validate metadata when it is available and to discover metadata when it is not.
The result of the analysis is used both strategically, to determine suitability of the candidate source systems and give the basis for an early go/no-go decision, and tactically, to identify problems for later solution design, and to level sponsors’ expectations.
Data Profiling
Data Explorer – A Data Profiling Tool
Your Logo
Introduction (2/2)
Find out whether existing data can easily be used for other purposes
Improve the ability to search the data by tagging it with keywords, descriptions, or assigning it to a category
Give metrics on data quality, including whether the data conforms to particular standards or patterns
Assess the risk involved in integrating data for new applications, including the challenges of joins
Assess whether metadata accurately describes the actual values in the source database
Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
Pourpose of Data Profiling
Data Explorer – A Data Profiling Tool
Your Logo
Existing System
Initially the data Profiling activities used to be done by writing complicated SQL queries
This would be comfortable for analyst or user who knows to write SQL queries
Many of us do not know the proper syntax and format for writing SQL queries
To overcome this, Data Profiling tools were introduced
Data Profiling Tools, to a some extent overcome the limitations for writing complex queries
All types of profiling activities were not supported by the tools
User has to understand and learn how to use the tool
Data Explorer – A Data Profiling Tool
Your Logo
Limitations of Existing System
Development time is more.
Need to understand the functionality for developing the queries.
Results needs to be exported to excel or notepad for anlysis
Traditional Approach
Complex User Interface
Limited Functionality.
Setup and Installation.
License Cost.
Minimum Server Requirements
SQL Queries Existing Tools
Data Explorer – A Data Profiling Tool
Your Logo
Proposed Solution
Developing an Application performing all the types of profiling
Easy to use interface
Minimum system requirements
Feature to export the profiling results data to excel
Additional feature to indicate the Data Quality i.e. Data Quality Indicator
Supporting multiple Databases like Oracle 10g, Oracle 11g, MS SQL Server 2005, MS SQL Server 2008, My SQL etc
Data Explorer – A Data Profiling Tool
Your Logo
Project Scope
Keeping the Time Line and other factors in mind, the project will currently support only MS SQL Server
Also the project will have following types of Profiling:
Column Profiling
Empty Column Analysis
Null Rule Analysis
Constant Analysis
Frequency Analysis
Uniqueness Analysis
Primary/Composite Key Analysis
Data Explorer – A Data Profiling Tool
Your Logo
Architecture Diagram
Data Explorer – A Data Profiling Tool
Analysis TeamAnalysis Team ManagementManagementBusiness UsersBusiness Users
Data ExplorerData ExplorerData
ProfilingCentral
Metadata Repository
Capture Issues
and Notes
MS SQL ServerMS SQL Server Other DatabasesOther Databases
Reporting
Your Logo
Implementation
The project will be implemented module wise.
Project will be having different modules. Each module will be developed individually and Unit Tested
After completion of all the modules and unit testing, all the modules will be integrated and System Integration Testing will be performed
There will be separate modules for Databases retrieval from server, Tables retrieval after selecting a database, Columns retrieval after selecting a Table
There will be separate module for each type of profiling discussed.
Data Explorer – A Data Profiling Tool
Your Logo
Implementation - Profiling Details
Column Profiling
This will help in discovering total no of records, null percentage, unique percentage, minimum and maximum value in the column, documented data type etc.
Constant Analysis
This will help in discovering those columns which has less than 4 and greater than 0 distinct values.
Null Rule Analysis
This will help in finding all the columns in a table which has 100% NULL values
Data Explorer – A Data Profiling Tool
Your Logo
Implementation - Profiling Details
Unique Analysis
This will help in finding all the columns in table which has 100% uniqueness.
Primary Key / Composite Key Analysis
It will help us to find out the possible primary or composite key columns which can be have unique combination.
Frequency Analysis
This will help in finding the no. of distinct values in the columns and the no. of time the value is repeated in a column.
Data Explorer – A Data Profiling Tool
Your Logo
Technology
Data Explorer will be developed on .NET platform using C# as a coding language.
.NET is Microsoft platform for developing advanced and Robust applications
.NET supports a wide range of library classes which eases the development efforts and hence more time can be utilized in other activities
.NET is called Language Independent Platform as it support 4 native languages and 21 non-native languages.
Native Languages are a Microsoft created languages i.e. C#. VB.Net. J#, VC++
Non-Native or Non Microsoft Languages supported are Pearl, Ruby on Rails etc
Data Explorer – A Data Profiling Tool
Your Logo
Hardware and Software Requirements
Data Explorer – A Data Profiling Tool
Data Explorer
HARDWARE SOFTWARE
• Pentium Core 2 Duo processor or above
• 2 GB RAM • 20 GB HDD • Printer • Router for Internet
Connection
• Windows 2000/ Windows XP/ Windows Vista/ Windows 7
• Microsoft .NET Framework 3.5
• Microsoft Visual Studio 2008
Your Logo
Features
Supports multiple databases like MS SQL Server, Oracle
Different types of profiling like
Column Profiling
Constant Analysis
Unique Analysis
Null Rule Analysis
Frequency Analysis
Empty Column Analysis
Primary / Composite Key Analysis
Quickly Analyze and validate data issues
Data Explorer – A Data Profiling Tool
Your Logo
Benefits
Quick discovery of data issues
No more writing of queries to profile data
Time efficient
Shorten the implementation cycle of major projects
Improve understanding of data for the users
Discovering business knowledge
Improves data accuracy in corporate databases
Data Explorer – A Data Profiling Tool
Your Logo
Future Enhancement
Data Explorer can be further extended to support unstructured or semi-structured data like flat files, .csv files
It can also be extended to support other relation data bases like MS Access, MySQL, Sybase etc Time efficient
It can also be enhanced by including Data Quality reports on top of Data Quality Results
There can be mechanism to store the profiling results so that it can be used or referred later at any point of time
Data Explorer – A Data Profiling Tool
Ihr Logo
Thank You
Data Explorer – A Data Profiling Tool
top related