categories of data storage - sascommunity · where trunc (i6.status_time) ... sas olap cube studio...
TRANSCRIPT
Copyright © 2006, SAS Institute Inc. All rights reserved.
SAS® 9 OLAP ServerJochen KirstenTechnology Manager StorageSAS EMEA
Copyright © 2006, SAS Institute Inc. All rights reserved. 2
Categories of data storage
MultidimensionalRelational
Copyright © 2006, SAS Institute Inc. All rights reserved. 3
What does SAS OLAP Server do?
SAS OLAP Server is a multidimensional data store designed from the outset to provide quick access to presummarized data, generated from vast amounts of detailed data.
Why is SAS OLAP Server important?
Decision makers need fast access to accurate information. Instantaneous access to summarizes of vast amounts of data is expected so timely decisions can be based on knowledge instead of gut feelings.
Copyright © 2006, SAS Institute Inc. All rights reserved. 4
Key Features include:• A multithreaded MDX query engine
• Parallel storage
• A graphical user interface for designing OLAP data sourc
• Special features that facilitate real world use, metadata
management, and cube optimization
SAS OLAP Server is a standards-compliant
OLAP data source that uses multidimensional expressions
(MDX) to query and navigate through multidimensional
information.
Copyright © 2006, SAS Institute Inc. All rights reserved. 5
Special features
Real-world use
• Time dimension for time based calculations• Geographic dimension for GIS support• Parallel drill hierarchies• Support for ragged and unbalanced hierarchies
Centralized Maintenance• All metadata is stored in the SAS Metadata Server• SAS Metadata Server maintains centralized security information• SAS Management Console is used to administer the OLAP Server
Open Architecture• SAS OLAP Server runs on all major hardware platforms• SAS OLAP Server stores its data in UNICODE• SAS OLAP Server can be accessed using Java or OLE DB for OLAP
Copyright © 2006, SAS Institute Inc. All rights reserved. 6
SQL can do – can it?
SQLMDX
OLAP type features added by database systems
or DOLAP applications
OLAP type features added by database systems
or DOLAP applications
“Pure”relational queries
“Pure”relational queries“Pure”
multidimensional queries“Pure”
multidimensional queries
Copyright © 2006, SAS Institute Inc. All rights reserved. 7
OLAP vs. SQL example
Show methe number of active Network portsfor the 5 largest customersfor each ending of monthover the last 12 month.
… that’s how it looks like in SQL:
Copyright © 2006, SAS Institute Inc. All rights reserved. 8
OLAP vs. SQL Example (cont.)
from ca_status i4 where trunc (i4.status_time) <= add_months (trunc (sysdate, 'MM'), -2) - 1
union all
select distinct i5.customer_id, i5.status_time, i5.ustate, i5.urole, i5.login,i5.ugroup, add_months (trunc (sysdate, 'MM'), -3) - 1
from ca_status i5 where trunc (i5.status_time) <= add_months (trunc (sysdate, 'MM'), -3) - 1
union all
select distinct i6.customer_id, i6.status_time, i6.ustate, i6.urole, i6.login,i6.ugroup, add_months (trunc (sysdate, 'MM'), -4) - 1
from ca_status i6 where trunc (i6.status_time) <= add_months (trunc (sysdate, 'MM'), -4) - 1
) j whereexists (select 'x' from nicetec.ca_status_last v,
nicetec.import_log l where v.customer_id = j.customer_idand v.import_id = l.import_idand l.import_time >= least (j.monat, sysdate - 1)
-- the user has been modified this month or later-- last_import only shows the date of the last status change-- which was not deleted at the point in time marked x
) ) where rk=1
Copyright © 2006, SAS Institute Inc. All rights reserved. 9
OLAP vs. SQL example (cont.)
SQL cannot handle inter-row calculations
In order to overcome this limitation, several intermediate steps are required to compensate
Intermediate steps can either be sub-select statements (memory consumption) or temporary tables (storage consumption)
SQL can handle dimensions (star schema) but it is not able to deal with hierarchies • SQL does not fulfill a major requirement of analysis• SQL and OLAP problems are two individual domains
Copyright © 2006, SAS Institute Inc. All rights reserved. 10
Same result in MDX
TopCount([Customers.[AllCustomers].Children,5,(Measures.[ActiveNetworkPorts],
ClosingPeriod(Time.[YMD].[Day],Time.CurrentMember)))
Copyright © 2006, SAS Institute Inc. All rights reserved. 11
Who uses OLAP?
Everybody who needs to do fast analysis of shared multidimensional information (FASMI)• Finance departments• Marketing departments• Manufacturing sector• Sales departments• …
Copyright © 2006, SAS Institute Inc. All rights reserved. 12
Basic terminology of a cube
Dimensions consist of• Dimension Name
• Level • Hierarchy
• Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
Copyright © 2006, SAS Institute Inc. All rights reserved. 13
Basic terminology of a cube
Dimensions consist of• Dimension Name
• Level • Hierarchy
• Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
Copyright © 2006, SAS Institute Inc. All rights reserved. 14
Basic terminology of a cube
Dimensions consist of• Dimension Name
• Level • Hierarchy
• Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
YEAR
QUARTER
Copyright © 2006, SAS Institute Inc. All rights reserved. 15
Basic terminology of a cube
Dimensions consist of• Dimension Name
• Level • Hierarchy
• Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
LevelOf
Detail
Copyright © 2006, SAS Institute Inc. All rights reserved. 16
Basic terminology of a cube
Dimensions consist of• Dimension Name
• Level • Hierarchy
• Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
Copyright © 2006, SAS Institute Inc. All rights reserved. 17
Navigation in multidimensional data
Switzerland
Basel Geneva Zurich
France
Europe
.CurrentMember.PrevMember
.Parent
.Children .LastChild
Copyright © 2006, SAS Institute Inc. All rights reserved. 18
Unbalanced hierarchy
COO
Director ofComms.
ExecutiveSecretary
CEO
Comms.Specialist
Copyright © 2006, SAS Institute Inc. All rights reserved. 19
Ragged hierarchy
United States
California
Washington DC
America
San Francisco
United States
Copyright © 2006, SAS Institute Inc. All rights reserved. 20
Star schema
A Star Schema is a dimensional model created by mapping data entities from operational systems
It has a central table (fact table) that links all the other tables (dimension tables) together
Copyright © 2006, SAS Institute Inc. All rights reserved. 21
Sample star schema
TIMEKEY Time dim. key
CUSTKEY Customer dim. key
PRDKEY Product dim. key
PONO PO number
POLINNO PO line number
QTYSOLD Quantity sold
UNITPRIC Unit price
SALEAMT Sales amount
SALEDTL Sales fact detail tbl
CUSTKEY Customer dim. key
CUSTNO Customer loyalty numCUSTLNAM Customer last name
CUSTFNAM Customer first namesCUSTADDR Customer address
CUSTPOST Customer postal code
CUSTDIM Customer dim tbl
CUSTREGN Customer region
CUSTCNTR Customer country
PRDKEY Product dim. key
SKU Stock keeping unit
PRDDSC Product description
PRGCOD Product group code
PRGDSC Product group desc.
BRNDNAM Brand name
PRDDIM Product dim tbl
COLRDSC Colour description
TIMEKEY Time dim. key
YYMM Calendar month
YYWW Calendar week
JULDY Julian day number
FYR Fiscal year
CYR Calendar year
TIMEDIM Time dim tbl
MTHNO Month number
WK Week number
DOW Day of week number
Copyright © 2006, SAS Institute Inc. All rights reserved. 22
OLAP concepts
MOLAP (multidimensional OLAP)• MOLAP is the default storage technology used by SAS OLAP
Server. MOLAP uses SAS’ own storage (SPDE) to store aggregations in a format that is optimized for multidimensional data structures.
ROLAP (relational OLAP)• ROLAP uses a relational database as storage for
multidimensional data. In most cases a star schema would be the foundation and aggregations would be linked in where appropriate.
HOLAP (hybrid OLAP)• HOLAP is a combination of MOLAP and ROLAP that
combines the benefits from both worlds.
Copyright © 2006, SAS Institute Inc. All rights reserved. 23
MOLAP in SAS9 OLAP ServerBy default SAS9 OLAP Server uses MOLAP to storeaggregations.
Cube designers can specify which file systems shouldbe used to store these aggregations.
The MOLAP storage option uses libraries that areoptimized for accessing multidimensional data. In their fundamental structure they resemble SPDElibraries, but MOLAP uses advanced clusteringand indexing to provide the fastest access to datathat is possible.
Copyright © 2006, SAS Institute Inc. All rights reserved. 24
ROLAP in SAS9 OLAP ServerIn certain situations (very high cardinalities for certainlevels) it can be interesting to use relational databasesto store the aggregations.
You can select “Cube will also use aggregated data from other tables” in the initial screen, and then pointeach aggregation to the respective table in a relationaldatabase system.
ROLAP can be used to make sure that datais kept within a single storage entity (relationaldatabase) for data security reasons.
Make sure you take all performance impacts intoconsideration when choosing ROLAP: It is yourRDBMS and your network that is now in chargeof performance.
Copyright © 2006, SAS Institute Inc. All rights reserved. 25
HOLAP in SAS9 OLAP Server
HOLAP is the combination of ROLAP and MOLAP• Your OLAP data source will get based on a fundamental
aggregation (NWAY) stored in a relational database system• All aggregations defined for a HOLAP cube can either be
generated by the OLAP server (MOLAP) or can point to pre-calculated aggregation tables stored in a relational database system.
Using HOLAP • Cubes can compensate large levels (with high cardinalities) by
storing them in relational databases, while providing fast access to standard business views (aggregations) stored in MOLAP.
• The NWAY crossing must be stored in ROLAP.
Copyright © 2006, SAS Institute Inc. All rights reserved. 26
Structure of the OLAP environment
Querying a Cube• MDX is sent to the OLAP Server• A result set is passed back to the
client
Creating a Cube• PROC OLAP creates a cube
(used by Cube builder)• PROC OLAP is executed on a
Workspace Server
Write Cube Data
MDX Query
Result Set
PROC OLAP
Copyright © 2006, SAS Institute Inc. All rights reserved. 27
SAS applications around the OLAP Server
SAS Management Console
• Register Servers
• Register Libraries
• Register Users
• Assign User Rights
SAS OLAP Cube Studio
• Register Tables
• Register OLAP Schemas
• Design Cubes
• Create Cubes
Copyright © 2006, SAS Institute Inc. All rights reserved. 28
Scalability in SAS OLAP Server
One Thread for every Query
One Thread for every Query
One Aggregation Selectionfor every Region affected
One Aggregation Selectionfor every Region affected
ParallelStorage
ParallelStorage
ConfigurableCube Cache
ConfigurableCube Cache
Copyright © 2006, SAS Institute Inc. All rights reserved. 29
Using parallel storage for cubes
FundamentalCube Data
NWAY
AdditionalAggregations
Index
Single physical path
OLAP Cube
Copyright © 2006, SAS Institute Inc. All rights reserved. 30
Benefits of parallel storage
datadatadatadata
One file system forall data
One file system forall data
Data is spread across multiple file systems
Data is spread across multiple file systems
Copyright © 2006, SAS Institute Inc. All rights reserved. 31
Basic cube types - MOLAP
MOLAP cubes can be based on a single flat table or on a star schema
MOLAP stores all cube data inside the SAS OLAP storage facility, optimized for multidimensional data
MOLAP Cube
Detailed datain a single
SAS data setDetailed datastored in a
Star Schema
OLAP Storage
OR
Copyright © 2006, SAS Institute Inc. All rights reserved. 32
Basic cube types - HOLAP
HOLAP cubes can be based on a single flat table or on a star schema
HOLAP stores cube data wherever it is appropriate:• SAS Storage• Flat files• RDBMS
Storage can be defined for every aggregation individually
HOLAP Cube
OLAP Storage (MOLAP)
RDBMS (ROLAP)
Aggregations with high
cardinality dimensions
Aggregations with low
cardinality dimensions
Copyright © 2006, SAS Institute Inc. All rights reserved. 33
Parallel drill hierarchies
Time DimensionTime Dimension
YearYear
QuarterQuarter
MonthMonth
YearYear
WeekWeek
WeekdayWeekday
Hierarchy1: Time by Month Hierarchy2: Time by Week
Copyright © 2006, SAS Institute Inc. All rights reserved. 34
Member properties
FranceFrance GermanyGermany ItalyItaly
LyonLyon NancyNancy CologneCologne BerlinBerlin VeniceVenice MilanoMilano
Location DimensionLocation Dimension
Country
City
Population:57.000.000
Population:82.000.000
Population:58.000.000
Copyright © 2006, SAS Institute Inc. All rights reserved. 35
FR
FRFR
FRUKRU
UKRU UK RU
UK RU
French data
English data
Russian data
All Languages get loadedinto one single UNICODE cube.
Cube is queried using standard MDX.
The session reports the client language to the server.
Multiple languages in one cube
Copyright © 2006, SAS Institute Inc. All rights reserved. 36
Accessing SAS OLAP Server
Java Clients Microsoft Clients
SAS Web Report StudioSAS Visual Data Explorer
SAS Information Delivery Portal
SAS Enterprise GuideMicrosoft ExcelProClarity
Copyright © 2006, SAS Institute Inc. All rights reserved. 37
Work in progress …
User / admin cancellation of run-away queries
Incremental cube updates
Allow for cubes with dimensions of very large cardinality• 2**64 per hierarchy level
Security GUI in SMC
Support for visual and security based totalling capabilities
Copyright © 2006, SAS Institute Inc. All rights reserved. 3838Copyright © 2006, SAS Institute Inc. All rights reserved.