a generalized lesson in etl architecture
TRANSCRIPT
![Page 1: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/1.jpg)
Company
LOGOA Generalized Lesson in ETL Architecture
Presented by Wes Dumey
Durable Impact Consulting, Inc.
June 11, 2007
![Page 2: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/2.jpg)
Agenda
1. ETL Overview1. ETL Overview
2. When is ETL Appropriate?2. When is ETL Appropriate?
3. Tools vs. Hard Coding3. Tools vs. Hard Coding
4. ETL Architecture 4. ETL Architecture
![Page 3: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/3.jpg)
• ETL Overview– 20 Minutes
• ETL Design Tips– 20 Minutes
• Demonstration– 20 Minutes
• Ask questions at any time
![Page 4: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/4.jpg)
Speaker Biography• Senior Consultant, Durable Impact Consulting,
Inc.• Experience on high-performance data
warehouses• Education
– B.S. in Computer Information Systems• Missouri State University
– M.A. in Business Economics (in progress)• University of South Florida
• External interests: aviation (Private pilot) and economics
![Page 5: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/5.jpg)
ETL Overview
• Extract Transform and Load is used to populate a data warehouse
• Extract is where data is pulled from source systems– SQL connect over networks– Flat files– Transaction messaging (MSMQ)
• Transformations can be the most complex part of data warehousing– Convert text to numbers– Apply business logic in this stage
• Load is where data is loaded into the data warehouse – Sequential or bulk loading
![Page 6: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/6.jpg)
ETL?
• Many companies find value in the graphical representation of data and use it in other applications as well
• ETL is very efficient when designed correctly
![Page 7: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/7.jpg)
ETL Tools vs. Hard Coding
• Many shops still use hard code (triggers, procedures, code blocks)
• Hard to maintain code• Hard to scale properly• ETL tools easy to visualize flows• With SSIS, there is no good reason to not
use an ETL tool
![Page 8: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/8.jpg)
What is going on here?
![Page 9: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/9.jpg)
ETL Design Methodology
• Steps for successful ETL Design1. Clear and concise requirements
2. Modularized design
3. Data cleansing capability
4. High Emphasis on Data Quality
5. Functional Testing
6. Sufficient Documentation• See the methodology document
![Page 10: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/10.jpg)
ETL Methodology Steps
1. Extract the data – pulls data
2. Load PSA and audit tables
3. Source Load Temp – sources and cleanses data
4. Lookup Dimensions – extract records for update
5. Lookup Facts
6. Transform Facts
7. Transform Dimensions
8. Quality Check
9. Load
![Page 11: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/11.jpg)
Design Considerations
• Naming conventions and comments• Standard approaches allow for:
– Quick, micro-batch processing (if desired)– Ability to pause/resume, resurrection
• Data cleansing • Legal requirements (HIPAA, SOX)• Industry-standard best practices• Data retention
– Archive vs Purge• Quality
![Page 12: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/12.jpg)
Demonstration
![Page 13: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/13.jpg)
Let’s Get Started
• Gather Functional Requirements• Build the Data Model• Write Technical Specification• Construct• Test
Follows the systems development lifecycle
IECT
![Page 14: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/14.jpg)
Closing Info
• Presenter Information• Blog
– www.thedamndata.com “A techies’ discussion of databases, datawarehouses, and the damn data itself”
– Covering SQL Server 2005, Oracle, IBM Websphere DataStage ETL tool, SSIS, and whatever the hell else is on my mind
– Check it out – funny and hopefully informative• Corporate Information
– www.durableimpact.com – Durable Impact Consulting• Presenting finalized EDW at Tampa Code Camp
![Page 15: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/15.jpg)
Cycle Diagram
Text
TextText
Text
Text
Add Your Text
Cycle Name
![Page 16: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/16.jpg)
Progress Diagram
Phase 1Phase 1 Phase 2Phase 2 Phase 3Phase 3
![Page 17: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/17.jpg)
Block Diagram
TEXT TEXT TEXT TEXT
TEXT TEXT TEXT TEXT
![Page 18: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/18.jpg)
Table
TEXT TEXT TEXT TEXT TEXT
Title A
Title B
Title C
Title D
Title E
Title F
![Page 19: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/19.jpg)
3-D Pie Chart
TEXT
TEXT
TEXT
TEXTTEXT
TEXT
![Page 20: A Generalized Lesson in ETL Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081413/54778fc2b4af9f48108b489e/html5/thumbnails/20.jpg)
Marketing Diagram
Title
TEXT TEXTTEXT TEXT