copyright© 2014, sira yongchareon department of computing, faculty of creative industries and...

30
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 10 Other topics in DW

Upload: brent-richardson

Post on 08-Jan-2018

216 views

Category:

Documents


1 download

DESCRIPTION

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 Slowly Changing Dimension ( SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base.  Need to track changes in dimension attributes in order to report historical data  For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia ? Slowly Changing Dimension (SCD) CustomerKeyCustomerIDNameCountry 1John1John New Zealand CustomerKeyCustomerIDNameCountry 1John1John Australia

TRANSCRIPT

Page 1: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

Copyright© 2014, Sira YongchareonDepartment of Computing, Faculty of Creative Industries and Business

Lecturer : Dr. Sira Yongchareon

ISCG 6425 Data Warehousing (DW)

Week 10Other topics in DW

Page 2: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 2

Advanced Dimensional modeling Slowly-Changing Dimensions Data Hierarchy

Physical Database Design OLAP Cubes and Operations

Outline

Page 3: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 3

Slowly Changing Dimension (SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base.

Need to track changes in dimension attributes in order to report historical data

For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia?

Slowly Changing Dimension (SCD)

CustomerKey CustomerID Name Country

1 John1 John New Zealand

CustomerKey CustomerID Name Country

1 John1 John Australia

Page 4: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 4

SCD has 6 types Type 0 - The passive method Type 1 - Overwriting the old value Type 2 - Creating a new additional record Type 3 - Adding a new column Type 4 - Using historical table Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Slowly Changing Dimension (SCD)

This is why no Type 5

Page 5: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 5

SCD Type 0 - The passive method No special action performed upon dimensional changes

Some dimension data can remain the same as it was first time inserted, others may be overwritten.

SCD Type 0

Page 6: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 6

SCD Type 1 - Overwriting the old value NO history of dimension changes is kept in the database

The old dimension value is simply overwritten be the new one.

Easy to maintain and is often use for data which changes are caused by processing corrections (e.g., miss spelling)

SCD Type 1

CustomerKey CustomerID Name Country

1 John1 John New Sealand

CustomerKey CustomerID Name Country

1 John1 John New Zealand

Page 7: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 7

SCD Type 2 - Creating a new additional record

All history of dimension changes is kept in the database

Attribute change captured by adding a new row with a new surrogate key to the dimension table

Also 'effective date' and 'current indicator' columns are used

SCD Type 2

CustomerKey CustomerID Name Country StartDate EndDate Flag

1 John1 John New Zealand 01/01/2014 31/01/2014 Y

CustomerKey CustomerID Name Country StartDate EndDate Flag

1 John1 John New Zealand 01/01/2014 31/12/2014 N

2 John1 John Australia 01/01/2015 31/12/2015 Y

Page 8: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 8

SCD Type 3 – Adding a new column

Only the current and previous value of dimension is kept in the database

New value loaded into 'current' column and the old one into 'previous' column

History is limited to the number of columns created for storing historical data

The least commonly used technique

SCD Type 3

CustomerKey CustomerID Name Current Country

Previous Country

1 John1 John New Zealand

CustomerKey CustomerID Name Current Country

Previous Country

1 John1 John Australia New Zealand

Page 9: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 9

SCD Type 4 – Using Historical Table

A separate historical table is used to track all dimension's attribute historical changes for each of the dimension

The 'main' dimension table keeps only the current data

SCD Type 4

CustomerKey CustomerID Name Country

1 John1 John Australia

CustomerKey CustomerID Name Country StartDate EndDate

1 John1 John New Zealand 01/01/2014 31/12/2014

1 John1 John Australia 01/01/2015 31/12/2015

Main table

Historical table

Page 10: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 10

SCD Type 6 – Combine approaches of types 1,2,3 Type 1 = Overwrite the old value Type 2 = Add a new record Type 3 = Add a new column

SCD Type 6

CustomerKey CustomerID Name CurrentCountry

HistoricalCountry

StartDate EndDate Flag

1 John1 John New Zealand 01/01/2014 31/12/2014 N

2 John1 John Australia New Zealand 01/01/2015 31/12/2015 Y

Page 11: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 11

Date (Time) dimensions

Location dimensions Product dimensions

Data Hierarchy

Region

Country

City

Category

Product Type

Product

Quarter

Year

Month

Week

Day

Page 12: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 12

Dimensional modeling Dimensions De-normalized (Star) or Normalized

(Snowflake and Fact Constellation)

Slowly Changing Dimension (6 types) Some interesting tutorial http

://www.youtube.com/watch?v=Eam2SmYgIzg

Data Hierarchy

Dimension Modeling : A Summary

Page 13: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 13

Database tables, indexes, partitions, summary tables Table design

Dimension tables and Fact tables PKs, FKs, surrogate/natural keys, and constraints

Partition design Sort and group data into different partitions Help speed up

query and improve scalability!

Index design To speed up query!!

Physical Database Design

Why do we have to care much about query’s “SPEED” in DW ??

Page 14: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 14

Partitioning Split a table into several smaller tables Partitions can be stored in a single database, or multiple

databases Improve scalability (when storing data) and performance (when storing

and querying data) Think about the “Fact Table” that contains 1 billion data records!

Approaches Vertical partitioning

Each small table contains some columns of the original table

Horizontal partitioning Each small table contains some rows of the original table

Partition Design

Page 15: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 15

Vertical partitioning Each small table contains some columns of the original

table

Partition Design

Page 16: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 16

Horizontal partitioning Each small table contains some rows of the original table

Partition Design

Which partitioning approach (Horizontal or Vertical) best helps DW database??

Page 17: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 17

Index is useful and speed up processing When a column is used in “searching/matching”

Country is used for searching in the WHERE clause So, indexing the “Country” will make the query processes faster!

Index Design

SELECT *FROM CustomersWHERE Country=“New Zealand”

Page 18: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 18

What about Queries in OLAP database

\

Index Design

SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue]FROM dimProducts p, dimCustomers c, factOrders f, dimTime tWHERE f.ProductKey=p.ProductKey

AND f.CustomerKey=c.CustomerKeyAND f.OrderDateKey=t.TimeKeyAND c.Country ='UK'AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998)

GROUP BY p.ProductName ORDER BY p.ProductName

From the above query, which column should be “indexed” ??

Page 19: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 19

OLAP Cubes

Page 20: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 20

An array of data understood in terms of its 0 or more dimensions.

You can make an OLAP Cube from any DW schema For example,

A star schema with 5 dimensions to a cube with 3 dimensions

OLAP Cubes

*from http://visibledata.wordpress.com/data/datacloud/datacube/

Page 21: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 21

What a cube represents? Dimensions Data cell = The fact that relates to all dimensions

OLAP Cubes

Page 22: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 22

With a cube, you can… Slice Dice Drill Down Roll up Pivot

Cubes Operations

Page 23: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 23

Slice operation To create a rectangular subset

of a cube with a fewer dimension by choosing a single value for one of its dimensions

Number of dimensions is reduced by one

E.g., from 3 dimensions to 2

Cubes Operations : Slice

From http://www.tutorialspoint.com/dwh/dwh_olap.htm

Page 24: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 24

Dice Operation To produce a subcube by

allowing the analyst to pick specific values of multiple dimensions

No dimension is reduced

Cubes Operations : Dice

From http://www.tutorialspoint.com/dwh/dwh_olap.htm

Page 25: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 25

Drill Down Operation To navigate among levels

of data ranging from the summarized to the more detailed

No dimension is reduced

Cubes Operations : Drill Down

From http://www.tutorialspoint.com/dwh/dwh_olap.htm

Page 26: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 26

Rollup Operation To summarize the data along

a dimension (by aggregation) Similar to Group by

Cubes Operations : Roll up

From http://www.tutorialspoint.com/dwh/dwh_olap.htm

Page 27: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 27

Pivot Operation To rotate the cube in space

to see its various faces

Cubes Operations : Pivot

From http://www.tutorialspoint.com/dwh/dwh_olap.htm

Page 28: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 28

Submission Both Phase-1 and Phase-2 (separate submissions) Due Monday 26 October at 9:30am Next week has a workshop Marking on Phase-2 NOT rely on DQLog produced from Phase-1

Interview sessions Will be conducted case by case (not all students are

required) Maximum penalty for “cheating”

A friendly reminder “Assignment 2”

Page 29: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 29

Assignment 2 Q/A

Continue working on worksheets Last chance to work and submit…

What’s next?

Page 30: Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon)Department of Computing, Faculty of Creative Industries and Business 30

Questions?