recipe 15 of data warehouse and business intelligence - how to monitor and analyze a data file...

How to monitor and analyze a data file How to monitor and analyze a data file before loading in Staging Areabefore loading in Staging Area

Recipes of Data Warehouse and Business IntelligenceRecipes of Data Warehouse and Business Intelligence

• In this use case, that is part of the Micro ETL Foundation (MEF), we try to handle a preliminary analysis of the data files.

• Configure and load the first test data file, is a activities which can be very time consuming. Very often the data file is full of special characters and other abnormalities that prevent its loading.

• Sometimes the data file is loaded correctly in the Staging Area, but the presence of "dirty" and/or special characters creates problems later to the Business Intelligence interfaces or to the HTML reports.

• Then, I will provide you a tool that can help identify problems immediately about data files. This will allow us to alert the feeding system indicating very precisely where are the anomalies.

Analysis of the use case

• There are no changes to the repository MEF. Download from (https://drive.google.com/open?id=0B2dQ0EtjqAOTZk1Eb3J0UEVsMnc) The installation is very simple and it will act as a plug-in to MEF. They will only be added new structures.

• If you haven’t MEF, take the latest version under (https://drive.google.com/open?id=0B2dQ0EtjqAOTNjZlUFR0NkIyQm8) and follow the readme file for the installation.

• Very useful is the new MEF_ASCII_CFT. This table was loaded with the 256 encodings ascii with descriptions. We see a fragment in the next figure.

• The table MEF_ANAFILE_LOT will contain the result of the analysis of the data file with only the lines that contained abnormal situations, on the "dirty" characters or on incorrect number of column separators (only for data file of csv type).

Changes to MEF repository

Changes to MEF repository

To test, simply run the procedure mef_dfana.p_analyze. Care must be taken in the input parameters:p_io_cod – data file code - This parameter does not affect the procedure. It can be useful for a link to the configuration table of the data file.p_dir - file folder - Oracle directory for the path of the data file. For example DWH_DAT.p_file - file name - File name. We can try a data file of a previous use case, for example, regmar_20160205.csvp_t1 - decimal format of the 1st terminator - It may happen that to indicate the end of row are used more characters. Indicate the decimal code of the first character. For example, specify 10 for the carriage return <CR>p_t2 - decimal format of the 2nd terminator if exists - Indicate the decimal code of the second character. For example, specify 13 for the line feed <LF>p_t3 - decimal format of the 3rd terminator if exists - Indicate the decimal code of the third character if it existsp_sep - decimal format of the columns separator - Indicate the decimal code of the field separator if it exists.(59=";",44=",")

Execute the use case

p_sep_cnt - counter of separators - Indicate the number of the expected field separators. Generally it coincides with the number of fields, but sometimes the last field is still followed by a separator. If you are unsure, it leaves null this parameter: the procedure will use the number of separatorsof the first row.p_from - analyze row from - Number of the line from which the analysis of the data file will start. The default is 1.p_to - analyze row to - Number of the last line to be analyzed. If null, it means the entire file. Could be useful, for very large data file, first try on a reduced number of rows.

To control the end-of-row characters, I suggest to open the data file by an editor with the exadecimal/decimal visualization. On open, don't convert to others format. Usually they are 10 <LF> or 13+10 <CR><LF>. To test, We can use the data files of the use cases.

SQL> exec mef_dfana.p_analyze('TEST','DWH_DAT','regmar_20160205.csv',10,null,null,44,null);


The message log table of MEF:


The result on the view MEF_ANAFILE_LOV:

References

http://www.slideshare.net/jackbim/recipe-14-of-data-warehouse-and-business-intelligence-build-a-staging-area-for-an-oracle-data-warehouse-1

http://massimocenci.blogspot.it/Email: [email protected]

recipe 15 of data warehouse and business intelligence - how to monitor and analyze a data file...

Software