agile data warehousing for the enterprise

8
Agile Data Warehousing for the Enterprise A Guide for Solution Architects and Project Leaders Ralph Hughes, MA, PMP, CSM ^8fl§i AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD • PARIS SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO ELSEVIER Morgan Kaufmann is an imprint of Elsevier

Upload: others

Post on 28-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agile Data Warehousing for the Enterprise

Agile Data Warehousing for the Enterprise A Guide for Solution Architects and Project Leaders

Ralph Hughes, MA, PMP, CSM

^ 8 f l § i AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD • PARIS SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO

ELSEVIER Morgan Kaufmann is an imprint of Elsevier

Page 2: Agile Data Warehousing for the Enterprise

Füll Contents

List of Figures List of Tables Abbreviations Foreword Acknowledgments

XVII

xxiii xxv

xxvii xxix

1. Solving Enterprise Data Warehousing's "Fundamental Problem"

The Agile Solution in a Nutshell 1 Five Legs to Stand Upon 3 The Agile EDW Alternative is Ready to Deploy 5 Defining a Baseline Method for Agile EDW 5 Plenty of Motivation to "Go Agile" 7 Structure of the Presentation Ahead 7

Part I Summaries of Generic Agile Development Methods

2. Primer on Agile Development Methods

Defining "Agile" 13 Agile Manifesto Values and Principles 19 Serum in a Nutshell 20

User Stories 21 Scrum's Five-Step Delivery Iteration 23

Contributions from Extreme Programming 26 XP Values and Principles 27

3. Introduction to Alternative Iterative Methods

Lean Software Development 31 Lean Origins 31 Lean Methods as a Long-Term Destination 32 Lean Principles and Tools 33

Kanban 41 Quick Sketch of the Kanban Method 41 Visualizing and Maintaining Continuous

Flow 43

Evidence-Based Service Levels Comparing Kanban to Serum

The Hybrid "Scrumban" Approach Rational Unified Process

RUP Overview Why NotRUPforDW/BI?

Part I References

Part II Review of Fast EDW Coding and Risk Mitigation

4. Essential DW/BI Background and Definitions

5. Recap of Agile DW/BI Coding Practices

Iterative Coding Alone Significantly Improves Bl Projects Yet Data Integration Remains

a Challenge

44 45 47 49 49 52

55

Primary Source for DW/BI Standards Defining Enterprise Data Warehousing

Basic Business Terms Data and Information Terms Information Services Terms Software Engineering Terms Basic Architectural Concepts

System Architecture Data Architecture Reference Architecture Enterprise Architecture

Architectural Frameworks Zachman Enterprise Architectural

Framework DAMA Functional Framework Hammergren DW Planning Matrix

Additional Data Warehousing Concepts Traditional Project Management Terms

60 61 63 65 66 67 70 70 71 74 75 76

76 76 77 79 82

85

85

IX

Page 3: Agile Data Warehousing for the Enterprise

X Füll Contents

New Roles for DW/BI Projects Project Architect Data Modeler Systems Analyst System Tester Proxy Product Owner Serum Master Including the New Roles on the Team's

Whale Chart 80/20 Specifications Developer Stories

DW/BI User Stories Hide Much of the Data Integration Work

Developer Stories Make DW/BI Work More Manageable

Developer Stories Require a Deeper Understanding of Value

Current Estimates Adding Techniques from Kanban

Pipelined Delivery Work-in-Progress Limits for Developers Iteration —1 and 0 Two-Pass Testing

Evidence-Based Service Level Agreements Proof that Agile DW/BI Works

Investigating Project Cost Impacts in More Detail

Some Myths Prove True

86 87 88 88 89 89 90

90 90 92

92

93

94 95 97 98

100 100 101 102 104

106 107

6. Eliminating Risk Through Nested Iterations

EDW Programs Slip into "231 Swamps" 109 231 Swamps Derive from a Command

and Control Strategy 110 Agile's Fundamental Risk Mitigation

Technique 111 Agile's General Risk Mitigation Strategy 111 Eliminating Miscommunication with

Multiplexed Engineering Phases 113 Agile EDW's Extended Risk Mitigation

Techniques 114 Three Types of Risk Threaten EDW

Programs 114 Mitigating the Risk of Application

Coding Concept Errors 116 Mitigating the Risk of Solution Concept

Errors 116 Mitigating the Risk of Business Concept

Errors 119

Part II References 121

Part III Agile EDW Requirements Management

7. Balancing between Two Extremes

Building the Case for Effective Requirements Management 126

Developers Often Neglect Requirements Work 128

Motivating Teams to Take Requirements Seriously 128

Easy to Overinvest in Requirements Management 130 "Requirements Management" Formally

Defined 130 Traditional Projects Employ a Big Spec

Up Front 130 Requirements are Inherently Diverse 132 Business Process Reengineering Can

Add to the Complexity 135 Reasons Not to Overinvest in Requirement

Work 136 Precision at the Expense of Accuracy 137 Business Partners are Adverse to Traditional

Requirements Gathering Efforts 138 Traditional Requirements Management

Fails More than it Succeeds 139 The Greatest Failure is Losing Business

Opportunity 139 Agile's Approach Centers on Balance 141

Agile Objectives for Requirements Management 141

Knowing when a Backlog is "Good Enough" 143

Enable Regulär "Current Estimates" 144 Keeping the Requirements Management

Process Agile 144 Two Intersecting Requirements

Management Value Chains 144 Salient Differences between GRM

and ERM 147 Business Analysts Implicit in Two Project

Lead Roles 149

8. Redefining the Epic Stack to Enable Value Accounting

Toward a Robust Epic Decomposition Framework 151 Defining the Backlog Hierarchy's

Structure 151

Page 4: Agile Data Warehousing for the Enterprise

Füll Contents xi

Aligning the Epic Stack to the Company's Hierarchy 152

Clearly Defining Each Level within the Epic Stack 154

Testing Whether Stories are Good Enough 156 Clarifying Everything with Value Accounting 159

The Basics of Value Accounting 160 Value Accounting Makes Developers

More Effective 161 Value Accounting Mitigates Project Risk 162

Allocating Value Throughout an Epic Tree 163 Identifying the Value of a Project 163 Allocating Value to Epics 164 Allocating Value to Themes and User

Stories 164 Value Buildups by Environment Provide

Motivation and Clarity 165

Artifacts for the Generic Requirements Value Chain

Beware of Requirements Churn User Modeling/Personas End Users' Hierarchy of Needs

Benefits Offered by the Bl Hierarchy of Needs

Mind Maps and Fishbone Diagrams Vision Boxes Vision Statements Product Roadmaps

169 170 171

173 174 176 176 178

10. Artifacts for the Enterprise Requirements Value Chain

The Generic Value Chain Can Overlook Crucial Requirements 181

ERM as a Flexible RM Approach 183 Focusing on Enterprise Aspects of Project

Requirements 184 Functionality Dimension 184 Polarity Dimension 185 Orientation Dimension 185 Streamlined ERM Templates 186

Uncovering Project Goals with Sponsor's Concept Briefing 186 Justification Type 187 Customer Experience Impacts 188 Functional Area Impacts Assessments 188 Value of the Program 188 Program Success Metrics 189

11.

Identifying Project Objectives with Stakeholder's Requests Business System Challenges Current Manual Solution Desired Business Solution

189 189 189 190

Volume Requirements and End-User Census 190 Dependent Systems 190

Sketching the Solution with a Vision Document 191 Solutions Statements 191 Features and Benefits List 191 Context Diagram 194 Target Business Model 196 High-Level Architectural Diagram 197 Nonfunctional Requirements 197

Segmenting the Project with Subrelease Overview 198 Subrelease Identifier 200 Subrelease Scope 200 Business Process Supported 202 Technical Description 207 Nonfunctional Requirements 208

Providing Developer Guidance with Module Use Cases 209 Goal 209 Standard Flow of Events 209 Alternative Flow of Events 210 Special Requirements 212 Source-to-Target Mappings as

Supplemental Specifications 212 Nonfunctional Requirements as

Supplemental Specifications 212

Intersecting Value Chains for a Stereoscopic Project Definition

Intersecting the Two Value Chains 215 Agile EDW's Version of Requirements

Traceability 215 Addressing Nonfunctional

Requirements 217 The Proper Problem Domain for

Agile EDW 217 Agile EDW Supports Broader

Architectural Activities 219 Supporting the Organization's

Software Release Cycle 221 Phases Borrowed from Rational Unified

Process 221 Iterations —1 and 0 Fit into the Inception

Phase 221

Page 5: Agile Data Warehousing for the Enterprise

xii Füll Contents

Arriving at a Predevelopment Project Estimate

Managing the Predevelopment Estimate Completing the Release Cycle

Techniques for the Elaboration Phase Choosing Developer Stories for the

Elaboration Phase Proving Out Architectures Using a

"Steel Thread" Prioritizing Project Backlogs Managing Incremental Precision

A Framework for Visualizing Progressive Requirements

The Freezer, Fridge, Counter Metaphor Effort Levels by Team Roles

Visualizing Requirements Management Demands with Effort Curves

Allocating Time for Nonfunctional Requirements

Conquering Complex Business Rules with an Embedded Method Add the Data Cowboy Role Special Skills and Tools for the Data

Cowboy Modified Data Mining Method Can Help Placing Business Rules Discovery and

Analysis into the Effort Curves Interfacing with Project Governance Not Returning to a Waterfall Approach

Part I I ! References

Part IV Agile EDW Data Engineering

12. Traditional Data Modeling Paradigms and Their Discontents

EDW at a Crossroads Reviewing the Reference Architecture Standard Normal Forms Lead to Complex

Integration Layers Conformed Dimensions Lead to Complex

Presentation Layers A Peek at the Agile Alternatives

Models, Architectures, and Paradigms Data Architecture Data Model Data Modeling Paradigm

Normalization Basics Designing Databases to Eliminate

Update Anomalies

223 225 226 226

226

227 228 229

230 230 232

232

234

235 235

236 236

238 239 242

245 13.

Example: One Table from First to Fifth Normal Form 262

Generalization Basics 271 Advantages and Disadvantages of

Generalization 271 Example: Generalizing a Sales Table

for the Party Entity 274 The Standard Approach and its Data

Modeling Paradigms 279 The Traditional Integration Layer as a

Challenged Concept 281 Involves an Expensive Hidden Layer 281 Results are Difficultto Understand 282 Entails High Maintenance Conversion

Costs 283 "Straight-To-Star" as a Controversial

Alternative 286 Four Change Cases for Appraising a Data

Modeling Paradigm 286 Change Case 1: Correcting Fourth Normal

Form Errors 287 Change Case 2: Generalizing to the Party

Model 287 Change Case 3: New Trigger Attribute

for a Slowly Changing Dimension 289 Change Case 4: Changing a Fact Table's

Grain 290

Surface Solutions Using Data Virtualization and Big Data

249 249

251

253 255 257 257 258 259 260

260

Leveraging Shadow It Example of a Five-Step Collaborative

Effort Lessons from the Case History

Faster Value Delivery with Data Virtualization

Defining Data Virtualization The Basic Use Case DVS Performance Features The Economics of Virtual Solutions DVS Surface Solutions and Progressive

Deployment Comparing DVS Surface Solutions

to the Previous Example Data Virtualization's Value Proposition EDW's Reference Architecture

Becomes Dynamic An Agile Role for Big Data

Introducing Big Data Technologies The Need for Big Data Technology The Promise of Schema-On-Read An Introduction to Hadoop

294

294 296

296 297 297 299 300

302

304 305

306 308 308 309 310 311

Page 6: Agile Data Warehousing for the Enterprise

Contents xiii

Notable Contrasts between SQL and MapReduce

Making MapReduce Look Like SQL with Hive

Big Data ls Not Just Hive Using Big Data to Enhance EDW Agility

Agile Integration Layers with Hyper Normalization

Hyper Normalization Hinges on "Ensemble Modeling" Several Varieties of Hyper Normalization

Ex ist Hyper Normalized Data Modeling Concepts

Business Key Entities Linking Entities Attribute Entities Lightly Integrated, Persistent Staging Area Ensemble Modeling Components

Allow Light Integration and Agility An Insert-Only Paradigm Swedish Variation: Anchor Modeling

Reusable ETL Modules Accelerate New Development One ETL Pattern Needed Per Hyper

Normalized Table Type Parameter-Driven ETL Module Prototypes Calling the Reusable ETL Modules Self-Validating Reusable ETL Modules Estimate of Comparative Development

Efforts Common Data Retrieval Challenges

and Their Solutions HNF Aids the Leading Edge of the

Integration Layer Only Retrieving Datafrom an HNF Repository

Doubly Difficult Solution 0: Focus on Presentation Layer

Objects Solution 1: Dummy Attribute Records Solution 2: Current Record Indicators Solution 3: Point-in-Time Tables Solution 4: Table Pruning Solution 5: Bridging Tables Solution 6: Retrieval Query Writers Clearing an Architectural Review

Re-Architecting the EDW for Hyper Normalization The Simple Vault Style The Enhanced Vault Style The Source Vault Style The Raw Vault Style Blending Styles to Achieve Agility

314

317 324 325

Enabling Evolution of Existing EDW Components Change Case 1: Splitting Out Entities Change Case 2: Upgrading to a

Party Model HNF-Powered Agile Solutions Evidence of Success

Online Financial Services The Free University

366 366

367 368 371 372 372

329

330 331 333 334 335 337

339 342 343

344

345 346 348 350

352

352

353

354

356 356 356 356 358 359 360 361

361 362 363 364 364 365

15. Fully Agile EDW with Hyper Generalization

Hyper Generalization Involves a Mix of Modeling Strategies 375 Extreme Generalization 377 Adding Time-Oriented Object

Classification 380 Managing Things and Links with an

Associative Data Model 381 Storing Attributes as Name-Value Pairs 384 Storing Transaction Data in a Lightly

Dimensionalized Format 385 Managing Hyper Generalized Data in

HGF Requires an Automation Tool 386 HGF Enables Model-Driven Development

and Fast Deliveries 387 Eliminating Most Logical and Physical

Data Modeling 387 Controlling the EDW Design from a

Business Model Diagram 387 Driving Design Changes Using a Business

Model 389 Loading Data into the Hyper Generalized

Integration Layer 390 Loading the Dimensional Objects 390 Loading the Transactional Objects 391

Retrieving Information from a Hyper Generalized EDW 392 HGF Systems Maintain a Performance

Sublayer 392 Performance Layer Objects Enable

Business-Intelligible Data Retrieval 393 Model-Driven Evolution and Fast

Adaptation 395 Impact of Model Changes on Existing

Data 395 Hyper Generalization Tools Facilitate

Data Conversions 396 Supporting Derived Elements 397

Value-Added Loops 397 Model-Driven Master Data

Components 398 Addressing Performance Concerns 402

Page 7: Agile Data Warehousing for the Enterprise

xiv Füll Contents

Demonstrating Agility Through Four Change Cases 403 Change Case 1: Upgrading Attributes to

Entities 403 Change Case 2: Consolidating Entities

into the Party Model 406 Change Case 3: New Trigger for a Slowly

Changing Dimension 409 Change Case 4: lncreasing the Grain

ofaFact Table 410 Recap of Change Case Findings 413

HGF-Powered Agile Solutions 414 Easier Backfills for Surface Solutions 415

Evidence of Success 416 Case History 1: Model-Driven

Development in Pharmaceuticals 416 Case History 2: Hyper Generalized Data

Warehousing in Specialty Retail 417

Part IV References 421

PartV Agile EDW Quality Management Planning

16. Why We Test and What Tests to Run

Why Test? Testing Keeps Agile Teams from Cutting

Corners Testing Keeps Root Cause Analysis

Manageable Testing Integrates Teamwork Across the

Pipeline Testing Leads to Better Requirements Testing Makes Real Progress Visible to

Everyone An Agile Approach to Quality Assurance

Striving for Balance Keeping Quality Assurance "Agile" Extending Test-Led Development Far

Above Unit Testing "What to Test?" Answered with Top-Down

Planning The Six Dimensions of DW/BI Testing Preliminary Definitions Dimension 1: Planning Dimension 2: System Dimension 3: Functional Dimension 4: Polarity

426

426

427

428 428

428 429 429 430

432

433 433 435 436 437 439 439

Dimension 5: Time Frame 440 Dimension 6: Point-of-View 440

A 2 X 2 Planning Matrix for Top-Down Test Selection 441 A Framework for Assessing a QA Plan's

Coverage 441 Linking Test Planning to Requirements

and Risk Management 443 "What to Test?" Answered Bottom-Up 444

Data Warehousing Testing Techniques 444 Traditional Application Testing

Techniques 446 Agile-Specific Test Techniques 449 An Easy-to-Follow Test Technique

Matrix for Low-Level Validations 451 Reusable Test Widgets 452 Test Cases Roll Forward Along the

System Dimension 453 Testing for Convergence 453

17. Designating Who, When, and Where

18.

Who Shall Write the Tests? A Framework for Understanding Who

Must Do What When Should Teammates Perform

Their QA Duties? Quality Activities Within an Iteration

Cycle Quality Duties at the End of a Release

Cycle Where Should Teammates Perform

Their QA Duties? Distributing Test Activities Across

Environments Distributing Test Techniques Across

Environments Key Quality Responsibilities by Team Role

Guiding the Team to Self-Organized Quality Planning

Suggested Quality Duties by Role The Overarching Duties of the System

Tester Certifying the User Demo's Data

How Many Testers are Needed?

457

458

463

464

466

468

468

469 470

470 471

473 474 475

Deciding How to Execute the Test Cases

Good Agile Quality Plans Involve Numerous Test Executions 477

Page 8: Agile Data Warehousing for the Enterprise

Füll Contents xv

Alternatives to Sufficient Testing Unattractive 480

Facing Up to Test Automation 481 Step 1: Update the Top-Down Plan 482 Step 2: Start Building the Parameter-Driven

Widgets 482 Step 3: Plan Out the Test Data Sets 482

Identifying How Many Data Sets are Required 484

Planning to Create Dozens of Data Sets 485 Planning Storage for Dozens of

Data Sets 487 Planning also for Expected Results 487

Step 4: Implement the Engine, Whether Manual or Automated 487

Defining Test Scenarios 489 Step 5: Define the Project's Set of Testing

Aspects 489 Step 6: Build and Populate the Test Data

Repository 490 Step 7: Quantify the Testing Objectives 491 Step 8: Begin Creating Test Cases 493 Step 9: Start Up the Engine 493 Step 10: Visualize Project Progress with

Quality Assurance 494 Tests Implemented by Environment 494 Connect Top-Down and Bottom-Up

Quality Planning 496 Defects Over Time 496 Current Iteration Burndown Chart 496

Step 11: Document the Team's Success 497

PartV References 499

Part VI Integrating the Pieces of the Agile EDW Method

19. The Agile EDW Subrelease Cycle

Making the Release Cycle a Repeatable Process 503

Traditional Notions of Data Governance 504 A Life Cycle for Data Governance 505 Data Governance Actions for the EDW

Team 508 Machine-Assisted Data Governance

for the Subrelease Cycle 509 The Agile EDW Subrelease Value Cycle 510

The Fast Requirements Portion of the Subrelease Cycle 511

The Fast Delivery Portion of the Subrelease Cycle 512

Centering the Value Cycle on Data Governance and Quality 514 Deepening the Support for Data

Governance 514 Achieving World-Class Quality

Assurance 515 Guiding the Agile EDW Transition 515

The DW/BI Customer's Bill of Rights 516 Toward an Agile EDW Manifeste 518

Part VI References 521

Index 523