olap solutions using pentaho analysis services
TRANSCRIPT
OLAP Solutions usingPentaho Analysis Services
Gabriele Pozzani
PAS
● Pentaho Analysis Services (PAS) provides– OLAP capabilities– To interactively analyze data through a cross-tab
interface– No need to define a query– A front-end provides the interface to retrieve and
format data● Drill-down● Drill-up● Slicing● Dicing
PAS components (I)
● PAS consists of four components
1. Mondrian OLAP Engine: receives MDX queries from JPivot and returns a multi-dimensional result-set
• Included in the Pentaho Server
2. Schema Workbench: designes and tests Mondrian cube schemas• Cubes are used by Mondrian to interpret MDX and
translate it into SQL queries on a RDBMS
PAS components (II)
3. JPivot analysis front-end: a Java-based analysis tool. Front-end for OLAP cubes
4. Aggregate designer: a designer for generating aggregate tables to speed up the analytical engine
Schemas
● Mondrian Schemas are XML documents– Describe multidimensional cubes– Describe the mapping between multi-dimensional
and relational model– Is used to translate MDX to SQL
MDX
● MDX: Multi-Dimensional eXpressions– A language designed for querying OLAP databases– A de facto standard developed by Microsoft
http://msdn.microsoft.com/en-us/library/ms145506.aspx
Pentaho Schema Workbench
Pentaho Schema Workbench
● PSW is a graphical tool– To create Mondrian schemas– To publish schemas to the Pentaho Server
Connect to DB
● The first thing to do is to establish a connection to the database– Options →
Connections...
JDBC Explorer
● Once the connection has been established you can explore the database– File New JDBC Explorer→ →
Create a new schema
● The schema editor can:– Create a new schema
● File New Schema→ →
– Save the schema on disk● .xml
– Edit object attributes– Switch to view the XML representation of the
schema● Only view. No editing
Main tasks
● Basic tasks for defining a schema are:
1. Create a schema
2. Create cubes2.1. Choose a fact table
2.2. Add measures
3. Create dimensions3.1. Edit the default hierarchy and choose a dimension
table
3.2. Define hierarchy levels
4. Associate dimensions with cubes
1. Create a schema
● File New Schema→ →
2. Create cubes
●
2.1 !!!
2.1. Choose a fact table
DB Schema
Table namein the schema
2.2. Add measures
●
3. Create dimensions (I)
● Dimensions can be added to:– A cube: "private dimensions" known only to the
cube that contains them– A schema: "shared dimensions" that can be
associated to multiple cubes
3. Create dimensions (II)
Fact tableforeign key
● Date/time related dim.has TimeDimension type
3. Create dimensions (III)
Usual dimensions haveStandardDimension type
3.1 !!!
3.1. Add/edit hierarchies● A new hierarchy is created for each dimension● New hierarchies can be added to dimensions● Each hierarchy must have a table node and one
or more levels
3.1. Dimension table
● Same settings for fact tables
3.2. Add hierarchy levels
●
4. Associate shared dimensions
● Shared dimensions can be associated to a cube adding a "Dimension usage"
●
Shared dim.
Testing and deployment
● Once schemas have been defined they may be– Tested using the MDX query tool (MDX) included in
PSW– Published to the Pentaho Server
MDX query tool (I)
● File New MDX Query→ →● If a schema editor is open MDX attempts to
connect to the underlying DB for loading the schema definition
MDX query tool (II)
● A query can be entered in the upper pane
● The result is shown in the lower pane
Publishing the cube (I)
● File Publish...→
Server URL
Password specified inpublisher_config.xml
User with privilegesfor publishing
Publishing the cube (II)
● If the connection succeeds a dialog appears– Choose the location in
the server's solution repository where to save the schema
– Specify the data source to use at the server side to execute the SQL queries (corresponding to the MDX ones)
JPivot
JPivot
● Once a cube has been published it can be used to build analysis applications
● Pentaho provides the JPivot front-end in the Pentaho User Console
Analysis View
Create a new analysis view
Schema to use
Cube to use definedinto the schema
New analysis view
JPivot toolbar
Drilling
● Drilling allows the user to navigate from one level of aggregation to another
Drilling flavors
● There are 4 different ways to drill, with different drill result
● Different drill ways can be selected in the toolbar– Drill member– Drill position– Drill replace– Drill through
Apply to dimensions
Apply to measures
Drill member & Drill position
● Drill member: the drilling on one instance of a member is also applied to all other instances of this member
● Drill position: the drilling occurs directly to the member instance and it is not applied to other instances of that member
Drill replace
● The drilled member is replaced with the drill result
Drill through
● It applies to measures● It retrieves the detail rows of the rolled up
measure aggregate value and shows them in a separate table
The OLAP Navigator (I)
● It is a GUI that allows to control the mapping between the cube and the pivot table– Which dimension is mapped to which axis– How multiple dimensions on one axis are ordered– What slice of the cube
is used in analysis
The OLAP Navigator (II)
● The navigator has three sections– A Columns section– A Rows section– A Filtes section
Controlling placement of dimensions on axes
● Clicking the little square before a dimension you can move the dimension from Rows (Columns) to Columns (Rows)
Slicing with the OLAP Navigator (I)
● A slicer corresponds to the MDX WHERE clause– Used to show only a subset (slice) of the data
● Clicking on the funnel icon you move a dimension in the Filter section
Slicing with the OLAP Navigator (II)
Specifying member sets
● It is also possible to specify particular members on columns and rows axes
MDX query pane
● You can also view the MDX query that represent the current state of the analysis view– Useful to learn MDX syntax
Export
● Print to PDF
● export in MS Excel format
Charts
● JPivot allows to display data in a chart● The chart can be configured
Alternative to JPivot
● Pentaho has a modular structure– It may be extended with new plugins
● SAIKU– Provide a plugin for Pentaho offering lightweight
OLAP features– It also provides a RESTful server that can connect
with any OLAP system– http://analytical-labs.com
Saiku
● It allows to execute OLAP analysis on any cube already defined
● Based on the definition of what we want to see in the analysis– By specifying which dimensions/measures we want
on columns, rows, and filters● Drag 'n' drop UI
Defining the analysis (I)
● Once a cube has been selected the available dimensions (with hierarchies) and measures are listed
Defining the analysis (II)
● Then, we can drag'n'drop dimensions and measures as we want in columns, rows, filters
– We are restricted only to not put measures on both columns and rows
● After each changethe query is updated and executed automatically
Defining the analysis (III)
Filtering
● Filters may be applied to visible (columns and rows) and invisible (filter) dimensions
Ordering● Each dimension and/or measure can be used to order data
– But not all possible combinations are allowed● We can't order both by a measure on columns and a dimension on
row (or viceversa)
Popup menus
● Some options for fast filtering and adding/removing dimension levels are available by clicking on columns and rows header
Charts
● Data can be also reported in a chart
Statistics
● Saiku allows also to show some statistics about columns values
Other commands
● Other available commands include:– Show MDX query– Drill through on cell– Export Drill-Through on cell to CSV– Export XLS– Export CSV
Saiku remarks
● Saiku is still in development– Some features of JPivot are missing– Some features have bugs or malfunctionings
● Charts● Drill through