solutions 5

Upload: cuonglun

Post on 07-Oct-2015

6 views

Category:

Documents


0 download

DESCRIPTION

Solutions 5

TRANSCRIPT

  • OLAP and Data Warehousing

    1. Consider the following situation: the sales department of a supermarket chain wants to have a system to support the strategic planning and evaluation of promotions. To this end, they need sales information over the different stores of the supermarket chain. For their analysis tasks they want to compute average sales and total sales, for different product types (e.g., food, non-food), brands, for different stores at different levels: province, country, and for different time periods: per year, month, quarter, semester and also by day of the week.

    a) How would you conceptually model the data needed by the sales department as a data cube? E.g., what are the measures, the dimensional attributes, the hierarchies, the aggregations that are needed?

    The dimensional attributes are Product, Store, and Date. The measurement attributes are Total_Sales and Average_Sales The hierarchies are as follows: Product Brand Store Province Country Date Month Semester Year Weekday

    b) Given the cube of part a), explain how you would construct the answers to the following queries with the operations slice-and-dice, pivot, roll-up, and drill-down. If necessary, indicate in which cell(s) of the constructed cube the answer can be found:

    a. Give the total overall sales per store.

    Cell (s,all,all) of the original cube. Slice on Product=all and Date=all. The measure is Total_Sales.

    b. Give an overview of the average sales per month per province.

    The measure is Average_Sales. Slice on Product=all. Roll-up Store to Province, and Date to Month. Represent it using a pivot-table on dimensions Store and Date.

    c. Give the sub-cube with only dimensions store at level province and date at level month for the average and total sales for the period 1999 till 2005.

    Slice: date must be in 19992005. Roll-up Store to Province, Date to Month.

  • 2. Given the setting of question 1, give a relational schema for this data. a) Give a SQL:1999 expression that produces the data-cube (i.e., contains all

    aggregates of the cube using the null-value in an attribute to represent aggregation on the corresponding dimension). How do you handle the multiple measures? The hierarchy?

    Relational schema: Product(pID, brand) Store(sID, province, country) Date(Day,Weekday,Month,Semester,Year) Sales(pID,sID,Day,Amount)

    SELECT P.pID, P.Brand, S.sID, S.Province,S.Country, D.Day, NULL as D.Weekday, D.Month, D.Semester, D.Year, average(A.Amount) as Average, sum(A.Amount) as Total

    FROM Product P, Store S, Date D, Sales A WHERE P.pID = A.pID and S.sID = A.sID and D.Day = A.Day GROUP BY rollup(P.Brand, P.pID), rollup(S.Country, S.Province, S.sID), rollup(Year, Semester, Month, Day) UNION SELECT P.pID, P.Brand,

    S.sID, S.Province,S. Country, D.Day, D.Weekday, NULL as D.Month, NULL as D.Semester, NULL as D.Year, average(A.Amount) as Average, sum(A.Amount) as Total

    FROM Product P, Store S, Date D, Sales A WHERE P.pID = A.pID and S.sID = A.sID and D.Day = A.Day GROUP BY rollup(P.Brand, P.pID), rollup(S.Country, S.Province, S.sID), rollup(WeekDay, Day)

    b) Give SQL:1999 expressions for the queries in 1b).

    Let Cube be the result of the query in 2a). Give the total overall sales per store.

    SELECT Store, Total FROM Cube WHERE Brand is NULL and Year is NULL and Weekday is NULL and Store is not NULL

    Give an overview of the average sales per month per province.

    SELECT Month, Province, Average FROM Cube WHERE Brand is NULL and Date is NULL and Month is not NULL and Province is not NULL and Store is NULL

  • Give the sub-cube with only dimensions store at level province and date at level month for the average and total sales for the period 1999 till 2005.

    SELECT Month, Province, Average FROM Cube WHERE Brand is NULL and Date is NULL and Month is not NULL and Province is not NULL and Store is NULL and Year >=1999 and Year

  • Hence, in total: 38 non-empty cells.

    c) (hierarchies) Suppose now that the following hierarchies are defined over the different attributes: Month Semester Year Store Size (S1 and S2 are large stores, S3 is a small store) Product Brand (P1, P2, P3 are of brand B1, P4 and P5 of brand B2) Color (P1, P3, P5 have color C1, P2 and P4 color C2)

    Include the hierarchies in the analysis you made for a) and b); i.e., compute the number of non-empty cells in the cubes taking the hierarchies into account as well. An example of a non-empty cell in the cube for the sparse relation would be the cell containing the total amount over all stores for brand B1 in the first semester.

    For a), the dense relation, the size of the relation is still 180. The size of the cube, however, becomes: (5 + 2 + 2 + 1) x (3+2+1) x (12+2+1+1) = 960

    For b), the sparse setting, the base relation still has 8 tuples. The number of non-empty cells in the cube is: 205 (this number is computed using a python script; it is assumed that all dates fall in the same year)