by jason perkins & william oshea mission critical bi in an edw 2.0 world
TRANSCRIPT
By Jason Perkins & William O’Shea
Mission Critical BI in an EDW 2.0 world
Jason B Perkins
• Chief Architect on the Secondary Uses Service (SUS) Programme for BT Health
• Over 10 years working on some of the world’s largest and most complex Business intelligence and Data warehouses programmes.
• Highlights from Career Lead BI Architect for BT Retail
o ODM Consumer Reference Set (CRS)
o BT Mobile Data Strategyo National Name & Address
Database (NAD) Solution Architect for (Swift) BT
Marketing Data warehouse.
• Qualifications TDWI Certified Intelligence Professional
(CBIP) DAMA Certified Data Management
Professional (CDMP)
• Subject matter expertise across Health, Retail and Telecoms
Will O’Shea
• Data Warehouse Consultant at AMS Systems, currently assigned to the NHS
• Over 20 years of experience in consulting, focusing on Data Warehousing and Oracle RDBMS.
• Highlights from Career• Worked with Gene Amdahl • Development Lead at Oracle• Oracle Consultant at Blue Cross• DWH Consultant at Pfizer • Data Warehouse consultant at Johnson &
Johnson; awarded Innovation award for “Data warehouse in a box”.
• Technical Architect at the NHS, awarded Champagne award by Atos Origin for implementing RDM process
• Education• MBA from University of Manchester (MBS)• BSc from University of Waterloo, Canada.• Oracle Certified Professional (10g DBA)
• Subject matter expertise Financial, Healthcare &
Pharmaceutical.
About the Presenters
Agenda
MCBI - The Business ViewMission Critical ArchitectureMission Critical Method
* BREAK *Mission Critical Principles
& Operating ModelMission Critical Building BlocksSummary
Business Intelligence?
ExpertKnowledg
e
FactsIntuition
“In God we trust. Everyone else bring data?”
W. Edwards Deming
Types of BI?
Operational BI
- Optimise & track core
operational processes
- Bottom up- Detailed
- Monitoring
Tactical BI
- Project analysis and
departmental activities
- Departmental- Detailed / Summary- Analysis
Strategic BI
- Strategic Execution and
analysis.
- Top Down- Summary
- Management
TDWI “Three threes of Performance Dashboards”
Mission Critical BI
Mission Critical BI :-“Systems that merit mission-critical
status are those that affect a range of business processes, and warrant
service-level agreements that align the business needs with system
performance. Gartner ”
Do not confuse the many other faces and names in BI:-
• Real Time / Right Time BI• Real time integration / Data freshness
• On Demand BI• High availability BI
Mission Critical BI – Why?
Business 2.0Always onSelf serviceJoined up - 360 view of the
customer. Available everywhere
BI/DW no longer a back office function /
system.Cost of entry in most industries.
What you do with it remains a competitive differentiator.
PervasiveBusiness
Intelligence
Globalisation
Zero Latency
Enterprise
Operational
Decisionsupport
“Enterprises compete by using up-to-date information to progressively remove delays to the management and execution of its critical business processes. Gartner”
Mission Critical BI – Real World Examples
E-everything – 24x7 E-Government
Health care monitoring – Commissioning / Payment for quality / results Referral to treatment times Payment for Quality
Telecommunication Bandwidth management / Mobile Coverage Order to fulfilment MIS
Retail – Just-in-time inventory
Mission Critical – Challenges
Mission Critical BI is not new! So why is it so hard?
“Pace of change” keeps increasing … Continued Pressure on IT Spend – estimated ~20-30%
reduction in 2009/10. BI / DW keeps evolving –
Many of the original mission statements of BI/DW remain elusive.
Increased demand for integrated information – e.g. unstructured, social media, etc.
Data Explosion – “Data volumes will grow exponentially while CPU capacity will increase only geometrically. Gartner”.
Security of all the information is paramount BI/DW remains a predominately “build” activity.
Mission Critical – EDW Scale
Complexity•Business Model•Data Integration•Mixed Workload
Exploitation •Number of Users•Exploitation Maturity
Size•Data Loaded•Data warehouse size•Information output
Number of different views need to be considered when quantifying the challenge ahead.
Varies by industry, type of business and geography.
Mission Critical BI Architecture
EDW Architectures
• Easy to Build Organizationally
• Limit Scope• Easy to Build
Technically
• No need for ETL• No need for separate
platform
• Allows easier customization of user interfaces and reports
• Tailor spokes for business.
• Single Enterprise “Business” View
• Data reusability• Consistency• Lowest TCO
• Business Enterprise view unavailable
• Redundant data costs• High ETL costs• High App costs• High DBA and
operational costs
• Only viable for low volume access
• Meta data issues• Network bandwidth
and join complexity issues
• Workload typically placed on op systems
• Business Enterprise view challenging
• Redundant data costs• High DBA and
operational costs• Medium ETL costs• Data latency
• Requires corporate leadership and vision
• Requires fully performant and scalable technology
Independent Data MartVirtual Data Warehouse Hub & Spoke Central Data Warehouse
Mission Critical
Maximum Availability
FlexibilityMaintenanceSecurity
LifecycleMethodInfrastruc
ture
Adaptability
OperationsMigrationsTechnology
Mission Critical DW Architecture
BI Applications
OLTP & ODSSystems
Business Applications
ExcelXML
BusinessProcess
Staging Tier
OperationalTier
Integration Tier
Performance Tier
Mission Critical DW Architecture
ExternalBusiness Applications
Unstructured
ExcelXML
BusinessProcess
Staging Tier
Integration Tier
DataQuality
Performance Tier
BI Applications Operational
OLAP SandpitsAggregates ConsolidationMarts
Auditing
Customer Tracking
Survivorship
MDM
Problem Resolution
Alerts DashboardsAd hocQuery
Reporting Web ServicesAnalytics
Conforming
Security
LoaderServices
Change DataCapture
DataExtracts
CommunityManagement
Error Management
Metadata Services
Workflow Monitor
Recovery / Restart
Job Scheduling
Resource Management
SCDManager
FactLoader
AdoptionServices
ValidationServices
Serviceability Architecture
Automation – lights out / zero touch
Flexibility - meta data/reference data driven
Robustness - error tracking, handling & reporting
Operationally ready
Maintenance - load/event tracking & reporting
Resilience – Ability to stop individual parts of the
system, restart
Robustness - error tracking, handling & reporting
Mission Critical Method
Nursery MethodRaison d'être
BI/DW requires an Iterative approach. Mission critical is no different.
New deliveries and changes must:-Protect core services.Facilitate “pace of change”Support re-useAllow experimentationAdapt to changing requirements Involve users
Developed “Nursery” Method in responseSupports front room and back room deliveriesReduce cycle time.“Nurseries” (AKA Sandboxes) – user initiated ETL processProduction of Transformation and Load templates
Nursery MethodGrowing a system
Everyone, business & developers, learns from both development and use of the systemIntroduces the ability to act on what has been learnedLeaves Nursery when mature, and is transplanted into production – not re-grown.
Planting the seedInitial
Planning
Planning
Requirements Analysis & Design
Implementation
TransplantImplementation
Testing
Evaluation
Delivery
Nursery
Nursery MethodThe Growing Stages
1. Initial Planning1. High level overall plan
1. How long are iterations
2. What deliverables are required
2. High level requirements
2. Planning1. Integrated Small teams2. Detail Iteration plan3. Higher level plan for 2 & 3
iteration
3. Requirements1. Requirements for iteration
1. Should fit within iteration
2. or get broken into small bits1. Start with lowest
level
4. Analysis & Design1. Integrated Small teams2. Design specification
5. Implementation1. Did I mention Integrated
Small teams2. Elaboration &
Implementation specification
6. Testing1. By both business and
developers
7. Delivery1. Delivery to users
8. Evaluation1. User feed back2. Quality reports
9. Transplant1. Final delivery should match
1.1 somewhat
Nursery MethodCreating a Nurturing Environment
First Steps1. Initial Plan
1. Overall objective?2. By when?
2. Define Roles1. Assign Roles
1. Business roles?2. User roles? 3. Supplier roles?
2. Commitment from those in the roles!!3. Define communication
1. Meetings?1. Frequency2. Types
1. Periodic weeding - Scrum2. Watering sessions – Stand-
ups3. others
3. Roles involved in each2. Tight Integration of roles
1. Documentation from each role – small
2. Frequency of documentation3. Type of documentation
4. Define outputs from each iteration/phase1. Plan for cycle
1. Roles involved at what stage2. Requirement documentation – small
5. Initial Schedule1. Length of iterations2. Potential number of iterations
Building the Nursery
Nursery MethodCreating a Nurturing Environment
Next Steps1. Define system
requirements1. Number of data
suppliers ?2. Amount of data?3. Number of users?4. Size of infrastructure
required2. Define First few iterations
1. Cycle 11. Get data ?2. Load data ?3. Extract data ?4. Distribute data ?
2. Cycle 21. Build some
validation?2. Extract validation
outcome?3. Cycle 3
1. Build in some robustness?
Size of Plot
Growth cycles
Nursery MethodPrincipals
Focuses on: Users – Not Processes and tools Working systems – Not exhaustive documentation Working together – Not adhering to the contract Delivering what is wanted – Not following a plan Adapting to Change – Not Issuing Change Requests
Both the Left side and the Right side must exist, but the emphasis is On the Left – Not the Right.
Benefits Cycle time from months to weeks, even days! Improve quality – leverage “Lessons Learned”, as they happen Reduce:
Cost Delivery time
Happy Users !!! Our Real world examples
Large International pharmaceutical company (delivered in Months not years)
Healthcare Provider (implemented new functionality in days)
Nursery MethodGreenhouses - Sandboxes
What Constitutes a SandboxWhat are the characteristicsHow do they need to act & interact
Users’ play areas Using the “Build Once – Use Many” principal users can
Load new data sets Create new tables Create new reports Play with existing data
Needs Work Flow Management – Key in a Mission Critical system Isolates the effects of users’ play areas from production Does Not isolate the data.
User can access production data Other users can access their data
Mechanism should exist to release into Production – if required Sandboxes are not Production; but rather a pathway to production Sandboxes are used as design, not as code
Nursery MethodPlanning
Tas k RolesInitial Phase Cycle 1 Cycle 2 Cycle 3
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 1 Week 2 Week 3 Week 4 Week 1 Week 2 Week 3 Week 4 Week 1 Week 2 Week 3 Week 41 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Initial PhaseDefine Nursery
agree cycle lengthagree cycle outputsagree documentation styleagree meeting schedule
Define Functional scopeagree functional requirementsagree scope of nurs eryagree exit s trategyagree failure criteria
Define Non-Functional Scopeagree # of usersagree initial record layoutagree volumeagree software usage
Define Architectureagree kit locationagree kit descriptionacquire kitdeploy kitdeploy s oftware
Define first G rowing cycleagree cycle 1 requirementsagree loading mechanismacquire dataagree extract mechanismpropose next few scopes
Cycle 1Planning
plan cyclehigh level plan for next cycleeven higher level plan for next but 1 cycle
Requirementsrefine available requirementsprioritise available requirementsagree cycle's requirementspropose next cycle's requirements
Analys is & Des ignelaborate cycle's requirementsagree des ign
Implementationbuild/develop/test
Tes tingUnit testingintegration tes tingsys tem testingFFP testing
DeliveryMake available to usersuser acceptance tes ting
Evaluationuser evaluationreview evaluationimpact requirementsadjust requirements
D ocumentsNursery DefinitionH igh level functional ScopeArchitecture DefinitionNFR agreementNursery PlanRequirements documentionEvaluation documentation
Differentiate between types of changes – one size does not fit all.Determines how many Cycles it should stay in the Nursery.
• Minor Changes to Reports and Semantic LayerCategory 1 –
• Changes to pre-canned reports / extracts • Do not require changes to Semantic layer
Category 2 – Deployment to live of new reports created by information analysts.
Category 3 – Simple changes to the Semantic layer.
• New Reports Category 4 – Creation of new reports / extracts.
• Changes Impacting semantic layerCategory 5 –
• Other changes to the semantic layer.• creation of new derived fields (not to be performed in the universe).
Category 6 – Changes to pre-canned reports / extracts that require changes to semantic layer.
Category 7 – Creation of new semantic later.
Nursery MethodExploitation – Managing “live” changes
Nursery MethodExploitation elaboration workflow
Mission Critical Principles& Operating Model
Mission Critical Adaptability
“Pace of change” – keeps increasing …
Its all about speed Speed of change Speed of information access
“Design for change” – as opposed to “built to last” Design to: Build Once – Use Many
Enter “Business Rule Management” (BRM) Process – Business Process Management (BPM).
Rules – Decision logic
Data – Decision variables
Process Rules
Data
Mission Critical Adaptability
• Design for change
Process – Business Process Management for operational
decision support Process flow or workflow for tactical / strategic
decision support
Rules – Rules Drive the Process Declarative approach Business user managed Descriptive
Data – Meta/Reference data Enforces the Rules
Thus data Drives the Process Contextual Volatile Flexible
Process Rules
Data
Mission Critical Adaptability
Examples of rules management …
Operational PrinciplesFlexibility
Users require “flexibility” without the need to re-develop.
Need to be able to Add and/or ModifyLoad ProcessApplication processingError processingValidationsRecipients of Load statistics (DQ, Errors,
etc)Encryption ProcessLoad and use new data (joined to
existing data)
As and when they want to
Without new code !!!
Operational team require the ability to configure and monitor processes.
• View ETL progress (real time)• Loads• Load steps• Load Statistics
• Reporting and tracking by:• Load• Business Unit• Time• Status
• Performance and statistic reporting.
Operational PrinciplesMaintenance
• Error tracking & maintenance against Load
• Control Loads if needed• Start (automatically &
manually)• Hold/Pause all or part of a
load(s)• Stop Loads• Restartable (from where
needed)
• System should output meaningful & understood Error messages.
• Specific Messages throughout application, so business know the area.
• Visibility of Operations Error maintenance.
• Ability to feed into process
Operational PrinciplesAdministration
• Statistical Real-time reporting & tracking of loads.
• Know what data has been loaded
• Know how much data has been loaded
• Know what stage each load is at.
• Know what business units have loaded data.
Business require Knowledge
Business & Operations requireA robust & resilient system
• Loads may be automatically restarted from where they were stopped/failed (as required)
• Each load job, step and statistic has start/end times and status
• ETL checks status of job to determine if it needs to/can be run.
• Fatal errors need manual intervention before they may be rerun.
• Performance and statistic reporting• Self initiating Loads
Operational PrinciplesResilience
How?
• Where can MDP help your DWH?• What Metadata does MDP need?• Feed MDP into Development
stream?• Educate developers to use it• Educate user to request it.• Educate the business to use it.
Operational PrinciplesSummary
ERROR (EWOC App)
ERROR_ID
STAT_CD
SYS_DT
PKGE_NM
MODL_NM
KEY_VALS
SQL_CD
SQL_ERRM
MSG_ID
MSG_TXT
APPSYS_ID
LOAD_RUN_ID
OP_ID
PROC_STEP_NM
OTH_CD_MSG
OTH_INFO
STAT_MSG
SYS_USER
TRACE_ID
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
LOAD_CHECK (EWOC App)
LOAD_RUN_ID
OP_ID
CHK_NM
STAT_CD
STRT_DT
END_DT
MESSAGE
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
LOAD_RUN (EWOC App)
LOAD_RUN_ID
LOAD_RUN_NM
STAT_CD
STRT_DT
END_DT
APPSYS_ID
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
LOAD_SRC (EWOC App)
LOAD_RUN_ID
OP_ID
SRC_OBJ_NM
STAT_CD
LOADED_TO_DT
LOADED_FROM_DT
STRT_DT
END_DT
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
LOAD_STEP (EWOC App)
LOAD_RUN_ID
OP_ID
STEP_NM
STAT_CD
STRT_DT
END_DT
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
LOAD_STEP_STAT (EWOC App)
LOAD_RUN_ID
OP_ID
PROC_STEP_NM
STAT_CD
ROWS_AFFECTED
ROWS_SELECTED
ROWS_INSERTED
ROWS_UPDATED
ROWS_DELETED
ROWS_MERGED
ROWS_CORRECTED
ROWS_DISCARDED
KEYS_GENERATED
KEYS_UPDATED
STRT_DT
END_DT
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
MSG (EWOC App)
MSGAPP_ID
MSG_ID
SEVERITY_IND
MSG_TXT
EFF_DATE
TERM_DATE
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
MSG_APP (EWOC App)
MSGAPP_ID
MSGAPP_NM
APPSYS_ID
EFF_DATE
TERM_DATE
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
MSG_HLP (EWOC App)
MSG_ID
MSGHLP_SEQ
CAUSE_SOL_IND
CNTRY_ID
MSGHLP_TXT
CRE_DATE
CRE_USER
UPD_DATE
UPD_USER
Data Warehouses require “Metadata Driven Processing” (MDP)
What can be MDP and what can’t?
• Loading Data – Types of loads, Source to target
• Load Control – Starting, stopping, branching, etc
• Errors & Messages – effects of & reporting on,
• Validation (DQ) – how, what, when & reports
• Encryption – how, what & when• Reverence Data Processing
Metadata Driven ProcessingEnterprise Warehouse Operational Components (EWOC)The Concept
Instance of Job
LoadStep
Load Statistics
ValidationOutcome
Business
Unit
MessageWork-Flow
Severity
Project
DataIntegration & Quality
Team?Application
Users• Admin
Validation
Rules
…Job
• Collection of Steps• Has a start and an
End
Job Step•Get data•Load staging•Load Atomic•Human Interaction•Etc.
Source•SUS•Cancer Registry
•Internal
Target• Internal• BO / OBI
Type• CSV File• XML• Table• Report/Extract
Validation• Lookups• Static values• Data Quality• Patterns• Linkage• Man/Ops• Etc
Business Unit
BUJob
BU Job
Step
Schema
Source
Schema
Target
BUValidation
• Additional• Less the non-mandatory
Infrastructure• Storage Allocation• CPU Allocation• Memory Allocation• Sand Pit
Schemas
Message• Validation• Load• Processing
Severity• Fatal• Error• Warning• Information
Cause &Solution
Project
Metadata Driven Processing The Metadata Driven ETL
Job
Job Step
Source Target
Type• CSV File• XML• Table• Report/Extract
Validation• Lookups• Static values• Data Quality• Patterns• Linkage• Man/Ops• Etc
Metadata Driven Processing (MDP)• Definition of Jobs
• Loads are specific instances of a Job
•Build re-usable modules•Metadata driven code, promote MDP•Quicker time to delivery, develop and test once• Add/Change source and target by changing MDP data• Add/Change ETL by changing MDP data
• Pick Lists• Defined by Reference data• Examples:
• Date range validation• Foreign Key Lookups• Mandatory / Optional• dd-mm-yyyy vs. yyyy/mm/dd• Y/N vs. 1/0
Metadata Driven Processing The Jobs - ETL
Message• Validation• Load• Processing
Severity• Fatal• Error• Warning• Information
Cause &Solution
• Fatal – Fails the load• Invalid file format
• Error – Load keeps going• Max number of
errors?• % of load rather
than #• Warning – not following rules
• Date format etc.• Information – no affect on load
• Dates out of range• Visit after
treatment
Metadata Driven Processing The Messages – Driving force
• Helps with future occurrences• Updated & Maintained
• Usage• Error reporting• Textual objects• Information
Messages• Load Reporting • Load Control
• Supports MDP• Feeds Metadata Driven ETL• Should be used throughout
ETL• Failure Checks/Traps• Exceptions• Reporting (DQ &
Validation)• Each error/trap/exception
has a unique Message ID• Headings/Titles/Text
• Severity can be changed• Changes processing when
changed
MessageGrouping
Metadata Driven Processing Data Quality & Linkage
ValidationOutcome
Report
s
Business Unit
•Canadian Office•Finish Office•UK Office
Report
sData
Integration &
Quality Team?
• Supports MDP• Key in any system, but more so in
a MC one.• Use Metadata to Drive process
• Important right people get right data
• Quickly• Rules Based Validation
• Data Quality Validation• Linkage Validation
• New rules can be added/removed
• When needed(no code required)
• Businesses users decide to add rules
• From pick list• Defined using building
blocks• Severity of failure of rule can
be changed• When needed(no code
required)• Businesses users decide
severity
Validation Rules
Metadata
DrivenETL
Audit Data
LookupsStatic valuesRangeConversionsPatternsLinkageMan/OpsEtc
Metadata Driven Processing Encryption
Encryption
Type
TargetData
Source & TargetDefinitio
n
Parameters
(keys)
ColumnType
Metadata
DrivenETL
• Supports MDP•Encryption is simply a specific Instance of a Job
• Built to perform Encryption• New Encryption Types can be added but do require code• New columns to be encrypted can be added by simply adding metadata, no code.• Keys can be stored or added at run-time
• AES128• Triple DES• Look-up• Home-Grown?
• Name• DoB• ID #
SourceData
Audit Data
Metadata Driven Processing Reference Data Management
Reference
Table Definiti
ons
ColumnDefiniti
ons
Metadata
DrivenETL
Business Unit
ImportTypes
SourceDefiniti
ons
SourceAttribut
eDefiniti
ons
BU Sources
TargetData
SourceData
Audit Data
• Supports MDP• New reference data can be added without new code• Different BUs can have different data but though same RDMT• Different Import types are catered for• Different Table Types are catered for
TableTypes
e.g. CSV, XML, Excele.g. K-Type 1, 2 & 3, Home grown, etc.
Metadata Driven Processing The Metadata Model
AUDITo CRE_DATEo CRE_USERo UPD_DATEo UPD_USER
JOB DEPENDENCY* EFF_DT* TERM_DTo DESCN
CHECK PARM TYPE# ID* NMo DESCN
CHECK PARAMETER# SEQ_NUM# PARM_VLAUE* EFF_DT* TERM_DT
MESSAGE HELP# CAUSE_SOL_IND* MSGHLP_TXT
LOAD CHECK* STAT_CD* STRT_DTo END_DTo MESSAGE
MESSAGE APPLICATIONS# ID* NM* EFF_DATE* TERM_DATE
ERROR# IDo STAT_CDo SYS_DTo MODL_NMo PKGE_NMo KEY_VALSo SQL_CDo SQL_ERRMo MSG_TXTo OTH_CD_MSGo OTH_INFOo STAT_MSGo SYS_USERo TRACE_ID
SYSTEM PARAMETER# CD* VALUE
LANGUAGE# ID* NM
CURRENT LOADo STAT_CDo STRT_DT
MESSAGES# ID* SEVERITY_IND* MSG_TXT* EFF_DATE* TERM_DATE
APPLICATION SCHEMAS# ID* NMo D ISPLAY_NMo DESCN
APPLICATION OBJECT# ID* NMo D ISPLAY_NMo DESCN
APPLICATION OBJECTTYPE# ID* CDo NM
APPLICATION OBJECT DETAIL# ID* NMo D ISPLAY_NMo DESCN
LOAD SRC# SRC_OBJ_NM* LOADED_FROM_DTo LOADED_TO_DTo STAT_CDo STRT_DTo END_DT
LOAD STEP STAT# PROC_STEP_NMo STAT_CDo STRT_DTo END_DTo ROWS_AFFECTEDo ROWS_SELECTEDo ROWS_INSERTEDo ROWS_UPDATEDo ROWS_DELETEDo ROWS_MERGEDo ROWS_CORRECTEDo ROWS_DISCARDEDo KEYS_GENERATEDo KEYS_UPDATED
LOAD STEP# OP_IDo STEP_NMo STAT_CDo STRT_DTo END_DT
LOAD RUN# IDo NMo STAT_CDo STRT_DTo END_DT
COUNTRY# ID* ISO_CD* NM
CHECK TYPE# ID* CD* NMo DESCN
CHECK# ID* NMo DESCNo PKGE_NMo MODL_NM
JOB CHECK# SEQ_NUM* EFF_DT* TERM_DT
JOB STEP# ID* SEQUENCE* NMo DESCN
JOB# ID* NMo DESCN
APPLICATION SYSTEM# ID* NMo D ISPLAY_NMo DESCN
DATA WAREHOUSE# ID* NMo DESCN
a
b
a1
b1a
b
a
b
a
b
a
b
ab
a
b
a
b
b
a
a
b
b
a
a
b
a
b
a
b
a
b
a
b
a
b
a
b
b
a
b
a
b
a
b
a
b
a
b
a
b
a
a
b
a
b
a
b
a
b
b
a
b
a
b
a
a
b
a
b
a
b
a
b
a
b
ab
a
b
a
b
b
a
Extending the Mission Critical Data Warehouse.
Most BI/DW requirements are not green field.
Extending existing is a key design objective.
Build Once – Use Many Adding new data sources Change existing data sources
Data linage - Metadata Where data has come from Where it has gone What has happened to it
along the way Impact Analysis
New exploitation (analysis and reporting) of existing DW
Adding new exploitation capabilities to DW
Metadata Driven Processing Extensibility
Audit Data
More building blocks
Technology Drivers
Examples of technology features supporting Mission Critical BI.
Analytics outside Data warehouse
BI Web Services High Availability
Data Warehousing
Real-Time Data Warehousing
Master Data Management (MDM)
From “TDWI Best Practice Report, Next Generation Data Warehouse Platforms, By Philip Russom”.
Mission Critical Performance
Leaving the Nursery (or Sandbox) Productionise the code Performance!!
Balance Brute force –
MPP (medium to high volumes / complexity / users) SMP (low volume / complexity / users)
Performance Layer BI tool and RDBMS calibration Speed of ETL vs. Need of Retrieval - when to do something and when to not.
80 – 20 rule Selective Denormalisation Selective Pre-Joins Aggregates and Summaries – are they always needed DWA no?, SMP yes?
OLAP
Performance metadata Row counts Elapsed time
Mission Critical Administration
Not all BI is mission critical – phew! Prioritise resources for Mission Critical
BI Applications Back office workload
Resources Management
Information Lifecycle Management Not all information is mission critical – phew!
Many benefits to segmenting information by its usefulness to the business.Performance / ThroughputCost effectivePrioritisation of resources
ILM - Number of levels1. Separate active and non active data.2. Compression non volatile data3. Read only for historic 4. ILM - Intelligent storage based on usage of information.
Automation is a key (emerging) requirement for supporting MCBI.
Mission Critical Security Security includes …
Business Continuity Confidentiality Information Classification Non Repudiation Privacy
Apply principle of “defence in depth” with multiple layers relating to security of information.
Protecting customer identifying information. Pseudonymisation (P14n) Anonymisation Linkage across datasets and over time but NOT customer
identifying. Usable
Audit Services: provision of audit trail for
Transactions applied to the database.
Access to data in the database.
Mission Critical Security
Pseudonymisation (P14n)Encryption
Reversible Non Reversible
SubstitutionSurrogatesAnonymisation
Other considerationsHarvesting / SharingUsability of outputKey destruction
Mission Critical Infrastructure Mission critical infrastructure requirements
Availability & Resilience Capacity on demand Ease of management Linear Scalability
Data warehouse infrastructure “Roll your own” data warehouses
Declining …
Data warehouse appliance (DWA) The “new” kid on the block
Cloud Services Way of the future?
Mission Critical – Maximum Availability
Data warehouse now have to meet following with NO downtime.
Planned Outages
System Changes
Application Changes
Migrations / Transitions
Mission Critical – Maximum Availability
Requirements Measured in 9’s No single point of failure. Tolerates many outages transparently Straightforward administration
Availability and Resilience Active / Standby Active / Passive Dual Active Fallback
Backup and recovery Automation Hot vs. Cold Incremental vs. Full
Second site
Software
Operational
Network
Hardware
Mission Critical Service Availability
Data MigrationsNew requirements –
No downtime for on boarding data or exploitation. No impact to data freshness. Minimise impact on existing system.
Differentiate between Migrations of new data source Migrations for existing subject areas (more common)
Phased data migrations.
Emerging Integration patterns Green field data migration Parallel Trickle data migrations. Mini batch data migrations
Mission Critical Data Migrations Independent data migration of (new)
data source. Partition data migration in order to
batch / trickle. Impact volumes against pattern to
understand impact of additional throughput.
Resource management a key requirement to protect existing system.
No downtime or data freshness impact on business.
Original structure
s
Newstructure
s
ETL
Data Migratio
n
Data MigrationGreen field
Mini batchOr Trickle
1
2
ETL
3
Mission Critical Data Migrations Concurrent maintenance of new and
old structures. Cut over on completion of data
migration to new structures. Impact volumes against pattern to
understand impact of additional throughput.
Failure to either new or original structures must result in rollback of both.
No downtime or data freshness impact on business.
Original structure
s
Newstructure
s
ETL
Data Migratio
n
Data MigrationParallel Trickle
Pattern
Trickle
1
2
Mission Critical Data Migrations ETL Maintenance at single data
structure at any point in time. Logically segment the source data
into discrete partitions. Execute mini batch migrations,
focusing on each partition in turn. Partition on volatility with early
phases based on least volatile data. Catch-up mini batches required for
changes during transition before final cut over.
No downtime or data freshness impact on business.
Original structure
s
Newstructure
s
ETL
Data Migratio
n
Data MigrationMini Batch Pattern
1
2
3
Mini batches
Mission Critical Data Migrations Pre-requisites
Data profiling and analysis of new / changes in data migrationUp front planning for Pipe cleaning and Rehearsal
Practically SelectiveOnly select entities you know you will need in that phase. If your hitting an entity consider taking it all.
Transition – Fail to plan is plan to fail! rehearsal is key.
Rolling Data quality monitorsAudit and Reconciliation
SummaryMission Critical is here …
What we need is an “Intelligent Data warehouse”Metadata driven Build once – use many
Why do we need it?Business Agility through Nursery Method –
Facilitates “pace of change” of business. Protects existing Mission Critical BI Services.
Operational patterns Empower the business Support the Mission Critical BI Services.
Integrated – exploitation of the customer “360 view”Secure – ensuring the right information to the right person
References
Massive But Agile: Best Practices For Scaling - The Next-Generation Enterprise Data Warehouse, Forrester.
TDWI Best Practice Report, Next Generation Data Warehouse Platforms, Philip Russom.
The ETL Toolkit, Ralph Kimball. Smart (Enough) Systems, James Taylor. Best Practices Mitigate Data Migration Risks and Challenges,
Gartner.
Questions
Thank you
Further queries contact us at:-
[email protected] [email protected] http://www.ewoc.info/