copyright© 2014, sira yongchareon department of computing, faculty of creative industries and...
DESCRIPTION
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 Slowly Changing Dimension ( SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base. Need to track changes in dimension attributes in order to report historical data For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia ? Slowly Changing Dimension (SCD) CustomerKeyCustomerIDNameCountry 1John1John New Zealand CustomerKeyCustomerIDNameCountry 1John1John AustraliaTRANSCRIPT
![Page 1: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/1.jpg)
Copyright© 2014, Sira YongchareonDepartment of Computing, Faculty of Creative Industries and Business
Lecturer : Dr. Sira Yongchareon
ISCG 6425 Data Warehousing (DW)
Week 10Other topics in DW
![Page 2: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/2.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 2
Advanced Dimensional modeling Slowly-Changing Dimensions Data Hierarchy
Physical Database Design OLAP Cubes and Operations
Outline
![Page 3: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/3.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 3
Slowly Changing Dimension (SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base.
Need to track changes in dimension attributes in order to report historical data
For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia?
Slowly Changing Dimension (SCD)
CustomerKey CustomerID Name Country
1 John1 John New Zealand
CustomerKey CustomerID Name Country
1 John1 John Australia
![Page 4: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/4.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 4
SCD has 6 types Type 0 - The passive method Type 1 - Overwriting the old value Type 2 - Creating a new additional record Type 3 - Adding a new column Type 4 - Using historical table Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)
Slowly Changing Dimension (SCD)
This is why no Type 5
![Page 5: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/5.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 5
SCD Type 0 - The passive method No special action performed upon dimensional changes
Some dimension data can remain the same as it was first time inserted, others may be overwritten.
SCD Type 0
![Page 6: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/6.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 6
SCD Type 1 - Overwriting the old value NO history of dimension changes is kept in the database
The old dimension value is simply overwritten be the new one.
Easy to maintain and is often use for data which changes are caused by processing corrections (e.g., miss spelling)
SCD Type 1
CustomerKey CustomerID Name Country
1 John1 John New Sealand
CustomerKey CustomerID Name Country
1 John1 John New Zealand
![Page 7: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/7.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 7
SCD Type 2 - Creating a new additional record
All history of dimension changes is kept in the database
Attribute change captured by adding a new row with a new surrogate key to the dimension table
Also 'effective date' and 'current indicator' columns are used
SCD Type 2
CustomerKey CustomerID Name Country StartDate EndDate Flag
1 John1 John New Zealand 01/01/2014 31/01/2014 Y
CustomerKey CustomerID Name Country StartDate EndDate Flag
1 John1 John New Zealand 01/01/2014 31/12/2014 N
2 John1 John Australia 01/01/2015 31/12/2015 Y
![Page 8: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/8.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 8
SCD Type 3 – Adding a new column
Only the current and previous value of dimension is kept in the database
New value loaded into 'current' column and the old one into 'previous' column
History is limited to the number of columns created for storing historical data
The least commonly used technique
SCD Type 3
CustomerKey CustomerID Name Current Country
Previous Country
1 John1 John New Zealand
CustomerKey CustomerID Name Current Country
Previous Country
1 John1 John Australia New Zealand
![Page 9: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/9.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 9
SCD Type 4 – Using Historical Table
A separate historical table is used to track all dimension's attribute historical changes for each of the dimension
The 'main' dimension table keeps only the current data
SCD Type 4
CustomerKey CustomerID Name Country
1 John1 John Australia
CustomerKey CustomerID Name Country StartDate EndDate
1 John1 John New Zealand 01/01/2014 31/12/2014
1 John1 John Australia 01/01/2015 31/12/2015
Main table
Historical table
![Page 10: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/10.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 10
SCD Type 6 – Combine approaches of types 1,2,3 Type 1 = Overwrite the old value Type 2 = Add a new record Type 3 = Add a new column
SCD Type 6
CustomerKey CustomerID Name CurrentCountry
HistoricalCountry
StartDate EndDate Flag
1 John1 John New Zealand 01/01/2014 31/12/2014 N
2 John1 John Australia New Zealand 01/01/2015 31/12/2015 Y
![Page 11: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/11.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 11
Date (Time) dimensions
Location dimensions Product dimensions
Data Hierarchy
Region
Country
City
Category
Product Type
Product
Quarter
Year
Month
Week
Day
![Page 12: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/12.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 12
Dimensional modeling Dimensions De-normalized (Star) or Normalized
(Snowflake and Fact Constellation)
Slowly Changing Dimension (6 types) Some interesting tutorial http
://www.youtube.com/watch?v=Eam2SmYgIzg
Data Hierarchy
Dimension Modeling : A Summary
![Page 13: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/13.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 13
Database tables, indexes, partitions, summary tables Table design
Dimension tables and Fact tables PKs, FKs, surrogate/natural keys, and constraints
Partition design Sort and group data into different partitions Help speed up
query and improve scalability!
Index design To speed up query!!
Physical Database Design
Why do we have to care much about query’s “SPEED” in DW ??
![Page 14: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/14.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 14
Partitioning Split a table into several smaller tables Partitions can be stored in a single database, or multiple
databases Improve scalability (when storing data) and performance (when storing
and querying data) Think about the “Fact Table” that contains 1 billion data records!
Approaches Vertical partitioning
Each small table contains some columns of the original table
Horizontal partitioning Each small table contains some rows of the original table
Partition Design
![Page 15: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/15.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 15
Vertical partitioning Each small table contains some columns of the original
table
Partition Design
![Page 16: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/16.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 16
Horizontal partitioning Each small table contains some rows of the original table
Partition Design
Which partitioning approach (Horizontal or Vertical) best helps DW database??
![Page 17: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/17.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 17
Index is useful and speed up processing When a column is used in “searching/matching”
Country is used for searching in the WHERE clause So, indexing the “Country” will make the query processes faster!
Index Design
SELECT *FROM CustomersWHERE Country=“New Zealand”
![Page 18: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/18.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 18
What about Queries in OLAP database
\
Index Design
SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue]FROM dimProducts p, dimCustomers c, factOrders f, dimTime tWHERE f.ProductKey=p.ProductKey
AND f.CustomerKey=c.CustomerKeyAND f.OrderDateKey=t.TimeKeyAND c.Country ='UK'AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998)
GROUP BY p.ProductName ORDER BY p.ProductName
From the above query, which column should be “indexed” ??
![Page 19: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/19.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 19
OLAP Cubes
![Page 20: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/20.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 20
An array of data understood in terms of its 0 or more dimensions.
You can make an OLAP Cube from any DW schema For example,
A star schema with 5 dimensions to a cube with 3 dimensions
OLAP Cubes
*from http://visibledata.wordpress.com/data/datacloud/datacube/
![Page 21: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/21.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 21
What a cube represents? Dimensions Data cell = The fact that relates to all dimensions
OLAP Cubes
![Page 22: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/22.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 22
With a cube, you can… Slice Dice Drill Down Roll up Pivot
Cubes Operations
![Page 23: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/23.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 23
Slice operation To create a rectangular subset
of a cube with a fewer dimension by choosing a single value for one of its dimensions
Number of dimensions is reduced by one
E.g., from 3 dimensions to 2
Cubes Operations : Slice
From http://www.tutorialspoint.com/dwh/dwh_olap.htm
![Page 24: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/24.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 24
Dice Operation To produce a subcube by
allowing the analyst to pick specific values of multiple dimensions
No dimension is reduced
Cubes Operations : Dice
From http://www.tutorialspoint.com/dwh/dwh_olap.htm
![Page 25: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/25.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 25
Drill Down Operation To navigate among levels
of data ranging from the summarized to the more detailed
No dimension is reduced
Cubes Operations : Drill Down
From http://www.tutorialspoint.com/dwh/dwh_olap.htm
![Page 26: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/26.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 26
Rollup Operation To summarize the data along
a dimension (by aggregation) Similar to Group by
Cubes Operations : Roll up
From http://www.tutorialspoint.com/dwh/dwh_olap.htm
![Page 27: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/27.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 27
Pivot Operation To rotate the cube in space
to see its various faces
Cubes Operations : Pivot
From http://www.tutorialspoint.com/dwh/dwh_olap.htm
![Page 28: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/28.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 28
Submission Both Phase-1 and Phase-2 (separate submissions) Due Monday 26 October at 9:30am Next week has a workshop Marking on Phase-2 NOT rely on DQLog produced from Phase-1
Interview sessions Will be conducted case by case (not all students are
required) Maximum penalty for “cheating”
A friendly reminder “Assignment 2”
![Page 29: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/29.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 29
Assignment 2 Q/A
Continue working on worksheets Last chance to work and submit…
What’s next?
![Page 30: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing](https://reader033.vdocuments.net/reader033/viewer/2022052706/5a4d1aed7f8b9ab05997c2cb/html5/thumbnails/30.jpg)
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 30
Questions?