strategic data management strategic data management –– cf
TRANSCRIPT
Strategic Data Management Strategic Data Management ––C f i th D t W hC f i th D t W hConforming the Data WarehouseConforming the Data Warehouse
S i M6Session M6September 24 2007September 24, 2007
Andrea Matulick, Acting Manager, Business Intelligence, Planning and Assurance Services UniSAPlanning and Assurance Services, UniSA
Robert Davies, Technical Team Leader, Enterprise Data Warehouse, ISTS, UniSAWarehouse, ISTS, UniSA
Strategic Data Management Strategic Data Management ––Conforming the data warehouseConforming the data warehouse
•• StrategyStrategy•• Data ManagementData Managementgg•• Conforming dataConforming data
Why do we need strategic data management?• To support the strategic planning cycle
T i i f d f di• To maximize performance and funding• To support integrated processes using technology• To become an analytic organisationTo become an analytic organisation
2
The standard business Strategic Planning cycle:The standard business Strategic Planning cycle:
• Formulate the strategy (using decisions from evidence based data)based data)
• Communicate the strategy• Analyse scenarios• Prepare plans and budgets• Monitor, forecast, report against actual data
Feedback the es lts fo the ne t st ateg c cle• Feedback the results for the next strategy cycle
3
Maximising performance and funding
Student Demand applications preferences TER
Measuring performance - examples of Key Performance Indicators
Student Demand applications, preferences, TER
Research Performance publications, research income, completions
Student Staff Ratio student load (EFTSL) staff FTE
Maximising funding - examples of funding formulae
Student Staff Ratio student load (EFTSL), staff FTE
Commonwealth Grant Scheme (CGS)
Funding Agreement total$ = Sum of (CGS student (EFTSL) per cluster x cluster funding rate$ )
Research Training HEP’s specific performance index = (HDR gScheme (RTS)
p p (completions x 0.5) + (Research Income x 0.4) + (Research Publications x 0.1)
Learning and Student demand (applications load)Learning and Teaching Performance fund (L&TPF)
Student demand (applications, load)Student experience (CEQ overall satisfaction, generic skills, good teaching) (GDS employment and further study)and further study)Student progression (success rate, retention rate, level of study)
Which data is important?Which data is important?
• focus on key issues and critical facts and measures• do not overload with data from every aspect of the
organisation g• data used in key performance indicators and funding
formulae • all data that underlies those indicators, right down to , g
transactional level• additional external competitor and benchmarking data• standardised and conformed data
“the right information must be delivered to the rightthe right information must be delivered to the right people at the right time and in the right context”
5
Why transactional systems cannot help implement strategy
• transactional systems collect and store data on day to day operational activities
• they do not focus on key issues and critical facts• data is siloed into source application areas (e.g.
students staff finance) not combined into logicalstudents, staff, finance) not combined into logical business processes
• data is not conformed (standardised) so that data may be aggregated across systems
• transactional data is not suitable for corporateanalysis reporting or dashboard applicationsanalysis, reporting or dashboard applications
6
How can electronic decision support systems help with implementing strategy?
• tie together computer applications and business processestie together computer applications and business processes related to key measures
• be able to analyse and report on the results in a timely and accurate manneraccurate manner
• move beyond transactional processing systems, incorporate key managerial processes into technology by integrating applications and using decision support systems
“While 76 percent of executives cite strategic planning as theWhile 76 percent of executives cite strategic planning as the top management tool to improve long term performance and strengthen integration across an organisation, only 33
t f ti l t i d i i t t lpercent of executives use electronic decision support tools that could help them in managing performance” (Hackett, business survey 2002)
7
( , y )
Electronic decision support systems – to help implement strategystrategy
System Major FunctionCorporate Performance M t t (CPM)
Integrate strategic planning documents, organisational processes, t t d ibilitiManagement system (CPM) targets and responsibilities
Dashboard system For senior managers to customise key views of performance in their area of responsibility
Benchmarking and Record and monitor key performance indicators against targetsBenchmarking and Scorecarding system
Record and monitor key performance indicators against targets, indicate success, failure and alerts
Budgeting and Forecasting system
Using actual data to model scenarios and predict future trends
Business Intelligence presentation layer
Enable user access to information via data, analysis and reports
Data Warehouse Reorganise transactional data into logical business models for corporate analysis and reporting needs. Combine data required for KPI’s, add value through external and conformed data, etc.
Metadata system Provide users context about the data, record data source, lineage, definitions, business rules, etc.definitions, business rules, etc.
Master Data Management system
Centrally store and maintain the major common data dimensions used by all areas of the organisation (e.g. org structure)
Data Quality system Conform and standardise common data dimensions across the Q y yorganisation, identify data errors and anomalies
Transactional systems Collect source data and process day to day transactions for the organisation
How do resources affect strategic data management?
• transactional systems well yresourced, supported, upgraded regularly
• other functions relatively yunsupported financially, without sufficient experienced resources BI resourcesp
• management continually complains about lack of reports and analysis, p y ,lagging timeliness of information, consistency and standardisation of results and definitions, lack of forecasting and scenario planning, lack of
i d
• need to allocate appropriate resources to decision supportcompetitor data,
benchmarking, etc. resources to decision support systems
How does the standard and integrity of data affect corporate decisions?
• “Data warehousing holds much promise to provide competitive advantage through derived business intelligence, but theadvantage through derived business intelligence, but the promise cannot be realised unless you ensure the integrity of your data. You must have end-to-end controls and the ability to identify data anomalies in source data from many
i l Th l i l foperational systems. These controls are an integral part of essential data management best practice.” (Maurer, IBM, DMReview.com, July 2007)
• Data Quality software products assist in identifying data quality issues, but cannot fix data
• Most organisations do not factor data quality resources into• Most organisations do not factor data quality resources into any of their plans, it takes a very low priority.
• Organisations do not realise that this omission may be producing poor data on which they are basing their strategicproducing poor data on which they are basing their strategic decisions.
10
Why is conforming the data important?Why is conforming the data important?
• transactional systems contain very little data quality control
• free text fields mean questionable validity, consistency standardsconsistency, standards
• reports and analysis become increasingly difficult , a computer does not recognise data to be the same unless it is identical (e g ‘male’ and ‘M’ ‘1’ and ‘01’ areunless it is identical (e.g. male and M , 1 and 01 are not the same to a computer)
• companies using Master Data Management systems to maintain common data used by multiple transactional a ta co o data used by u t p e t a sact o aapplications
• consistency and standardisation of data and business yrules across an organisation essential for the quality and usefulness of corporate analysis and reports
11
How does a data warehouse assist in organising, standardising and conforming data?
• A data warehouse reorganises transactional data into glogical business models for corporate analysis and reporting needs
• Combines data from multiple systems and external• Combines data from multiple systems and external data in a way that is meaningful to the business
• Uses standard business rules and conformed standard code sets
• Can only report across data from multiple systems if• Can only report across data from multiple systems if the dimensions are conformed and can be reused across the fact data
• Highlights the need for data quality frameworks and master data management systems to be part of the data management strategy
12
g gy
Data Warehouse Integration MatrixData Warehouse Integration Matrix
Data that conforms well - external National Research Performance data
• Research funding (RTS) and the research quality framework (RQF) use publications, income and completions to measure performancecompletions to measure performance.
• DEST provides national data on these measures as a series of reports in a spreadsheet.
• The data can be loaded into a data warehouse along with standard reference data to analyse trends, share, benchmarking, rankings etc.benchmarking, rankings etc.
• By combining the warehouse data in an OLAP cube, we can see how our university is performing in the sector.
14
external National Research Performance dataexternal National Research Performance data
external National Research Performance data
Analysis vs ReportsAnalysis vs Reports
• The original spreadsheet report is static and only shows one measure at a time
• In the warehouse we can add State ATN and National• In the warehouse we can add State, ATN and National benchmarking totals , share, rankings
• The OLAP cube provides the ability to analyse the data rather than just look at one report at a time
• However this data is lagging by at least one year• However, this data is lagging by at least one year• We have good data quality, context, but not timeliness• Need to load our ‘live’ research data into theNeed to load our live research data into the
warehouse daily to help make good strategic decisions
17
Data with conforming issues - Live Research Performance data
• need to see performance areas during the current year• need to see performance areas during the current year compared to previous years
• good and poor performing divisions and schools (org it )units)
• data comes from 3 transactional systems – Research Master, Finance One, and Empower HR
• the data warehouse design was successful in bringing together the data from the 3 systems
• however, the issue of non conforming data proved to be a problem in a number of areas, including org structure which was supposedly controlled from astructure which was supposedly controlled from a central ‘master’ file
18
Live Research data – Publications per FTELive Research data Publications per FTE
Org Unit code conforming issues – two transactional systems
20
Some alternative thoughts on data qualitySome alternative thoughts on data quality…
“Blame everything on the source data and point out that fixing source systems is out of scope ”fixing source systems is out of scope.
“Only use BI tools that let users export the reports to Excel where they can play with the data and produce information that looks much more accurate.”
(McBurney, Senior Consultant, 2006)
21
OverviewOverviewOverviewOverview
Alternative approach to managingAlternative approach to managingAlternative approach to managing Alternative approach to managing conformed data in the warehouse.conformed data in the warehouse.
2222
Agenda/ContentsAgenda/ContentsAgenda/ContentsAgenda/Contents
BackgroundBackgroundBackgroundBackgroundChallenges for a warehouse startupChallenges for a warehouse startupAdditional challengesAdditional challengesThe two approachesThe two approachesThe two approachesThe two approachesOracle Warehouse BuilderOracle Warehouse BuilderOther toolsOther toolsOther toolsOther tools
2323
University Of SA Data WarehouseUniversity Of SA Data WarehouseUniversity Of SA, Data WarehouseUniversity Of SA, Data Warehouse
The University data warehouse consists of data from source systems:The University data warehouse consists of data from source systems:Finance, HR, Student, Master data management system – ie. Org Unit dataR h d i i t ti t d t lit h ll dResearch administration system -- data quality challenged
Covers research and student related business areas consisting:Covers research and student related business areas consisting:10 fact tables2 snapshot fact tables
70 dimension tablesEnvironment: Oracle 9i Rel2, OWB Rel 1, Cognos ver7
2424
Challenges in a normal EDW startChallenges in a normal EDW start--up phase:up phase:Challenges in a normal EDW startChallenges in a normal EDW start up phase:up phase:
Methodology and documentation standardsDesign and developing ETL technical infrastructureDesign and developing ETL technical infrastructureMaster Data Management System – one source of the truth
ie In-house built application to manage Org Unit dataie. In house built application to manage Org Unit data.Build the warehouseDesign and build the BI layerDesign and build the BI layer.
2525
Extra challenges to addressExtra challenges to addressExtra challenges to addressExtra challenges to address
Requirement for low on going support
Some source systems include poor data qualitySome source systems include poor data quality
Business processes and political environmentBusiness processes and political environment not focused on data quality improvements
2626
Containing the extra challenges through Containing the extra challenges through adaptive designadaptive design
How can we better manage the extra challenges?How can we better manage the extra challenges?
Establish Master Data Management systemEstablish Master Data Management system – one source of data
Don’t want Data Quality (DQ) issues to destroy these gains.
Therefore need a robust way to manage DQ issues in the warehouse with minimum impact and intervention.warehouse with minimum impact and intervention.
2727
Containing the extra challenges through Containing the extra challenges through adaptive designadaptive design
Managing reference data with DQ issues.
Two approaches considered given our challenges:
1 Kimball recommended approach1. Kimball recommended approach
2. University of SA, Hybrid Approach
O acle Wa eho se B ilde Release 1 does not p o ide an a tomaticOracle Warehouse Builder Release 1 does not provide an automatic means to manage SCD dimension tables.
2828
Kimball recommended approachKimball recommended approachKimball recommended approachKimball recommended approach
For an incoming fact row that has an unmatched dimensional value :For an incoming fact row that has an unmatched dimensional value :
automatically create a new dimension entry place holder as a result.
assume at a later date the dimension row which matches the placeholder will arrive and overwrite the placeholder with a full row of attributeswill arrive and overwrite the placeholder with a full row of attributes.
2929
Kimball recommended approachKimball recommended approachKimball recommended approachKimball recommended approach
Advantages:Advantages:No Factual data is lost (?)Proven approach which works efficiently for large Fact tablesProven approach which works efficiently for large Fact tablesSome ETL tools do this work for you.
3030
Kimball recommended approachKimball recommended approachKimball recommended approachKimball recommended approach
Disadvantages:Disadvantages: more than one source of data for the dimensionpotentially more than one source of the truth.
if the dimension is conformed then rubbish data is made available to all areas of the data warehouse, unless it is managed.areas of the data warehouse, unless it is managed.
If effective dating is involved, has the potential to corrupt contiguous date ranges.
If only part of a placeholder is available (ie the code and no Efft Date) from the Fact row then Fact record gets written to a log file and dim key set to unknown or Fact row is rejected completelyy j p y− Either case probably requires manual intervention to resolve.
3131
UniSAUniSA hybrid approachhybrid approachUniSAUniSA hybrid approachhybrid approach
Capture UnknownsCapture UnknownsFor an incoming fact row that has an unmatched dimensional value :
1. Store the unmatched business code in the core Fact table. 2. The dimension surrogate key within the fact record is set to -1.
Hide business code from user reporting layer
3232
UniSA hybrid approachUniSA hybrid approach –– Unmatched Business CodeUnmatched Business CodeUniSA hybrid approach UniSA hybrid approach Unmatched Business CodeUnmatched Business Code
Fact Surrogate_key | | Org Unit Bus Code | Org Code Key | Org Key Version | Fact Measure |
1001 | | GPB | -1 | 1 | 0.5 |1001 | | GPB | 1 | 1 | 0.5 |
1001 | | ITU | 1234 | 1 | 1.0 |
1001 | | ITU | 1234 | 2 | 0.9 |
3333
UniSAUniSA hybrid approachhybrid approachUniSAUniSA hybrid approachhybrid approach
Reprocessing the Unknowns.Reprocessing the Unknowns.
At a later date: Copy the core fact rows into the staging table where the
business code exists and the correspondingsurrogate key = -1surrogate key 1
Reconcile against the dimension table in order to obtain a known key. Reuse existing transformation mappings
Merge the Fact record back into the core Fact tableMerge the Fact record back into the core Fact table
3434
UniSA hybrid approachUniSA hybrid approach –– Reprocessed Business CodeReprocessed Business CodeUniSA hybrid approach UniSA hybrid approach Reprocessed Business CodeReprocessed Business Code
| Org Code Key | Org Key Version | Org Code | Org Description | Current Flag | Org Code Key | Org Key Version | Org Code | Org Description | Current_Flag
| -1 | 1 | | Unknown | Y
| 1234 | 1 | ITU | Info Tech | N
| 1234 | 2 | ITU | Information Tech | Y
3535
| 1235 | 1 | GPB | Grounds | Y
UniSAUniSA hybrid approachhybrid approachUniSAUniSA hybrid approachhybrid approach
Advantages:Advantages:No Factual data is lost.The one data source controls the truth for each dimension table.Automatic poor data quality quarantineData quality issues peculiar to the given source system are not
propagated throughout the entire warehousepropagated throughout the entire warehouse.No ongoing maintenance overhead with potential accumulation of
rubbish data within dimensions.No need for an ever-expanding number of fix up scripts.DQ issues can be handled on a subject area basis, assisting in
prioritizationprioritization. Unknowns report per Fact subject area available for DQ department.
3636
UniSAUniSA hybrid approachhybrid approachUniSAUniSA hybrid approachhybrid approach
Disadvantages:Disadvantages:Fact table requires extra processing on a regular basis in order to
reconcile the unknown dim keys.
Requires the raw business codes are present in the Fact (not necessarily visible for user reporting)visible for user reporting)
Possibly not suitable for very large Fact tables where DQ is an ongoing issue, > 10 million fact table records, but Ok for Uni data volumes.
3737
Oracle Warehouse Builder (OWB)Oracle Warehouse Builder (OWB)Oracle Warehouse Builder (OWB)Oracle Warehouse Builder (OWB)
Low costLow costMappings automatically perform bulk insertsExcellent ETL auditing information availableProcess Flow allows forking of multiple database g p
sessions
3838
Use Emphasis on GraphicsUse Emphasis on GraphicsUse Emphasis on GraphicsUse Emphasis on Graphics
3939
OWB Process FlowOWB Process Flow -- controlcontrolOWB Process Flow OWB Process Flow -- controlcontrol
4040
OWB Process FlowOWB Process Flow –– session forkingsession forkingOWB Process Flow OWB Process Flow session forkingsession forking
4141
More alternative thoughts on data qualityMore alternative thoughts on data quality…
“Default null values to the word ‘unknown’. If anyone questions this point out that unknown is used liberally throughout all the source systems and is more useful than not knowing that it is unknown.”
“You will soon find that your information managementYou will soon find that your information management projects are being delivered on time and are no less accurate than the source systems”
(McBurney Senior Consultant 2006)
4242
(McBurney, Senior Consultant, 2006)