query, analysis, and visualization of hierarchically structured data using polaris chris stolte,...

35
Query, Analysis, and Query, Analysis, and Visualization of Hierarchically Visualization of Hierarchically Structured Data using Polaris Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Query, Analysis, and Visualization of Query, Analysis, and Visualization of Hierarchically Structured Data using Hierarchically Structured Data using

PolarisPolaris

Chris Stolte, Diane Tang, Pat Hanrahan

July 2002

Page 2: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

MotivationMotivation Large databases have become very

common• Corporate data warehouses

• Amazon, Walmart,…

• Scientific projects: • Human Genome Project• Sloan Digital Sky Survey

Need tools to extract meaning from these databases• Programmatic data mining/statistical analysis

• Visual exploration and analysis

Page 3: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Hierarchical StructureHierarchical Structure

Challenge: these databases are very large• Queries can not visit every record• Visualizations can not display every record

Analysts have augmented databases with hierarchical structure• Provide meaningful levels of abstraction • Leveraged by both computer and analyst• Derived from semantics or programmatic analysis

Tools need to take advantage of these hierarchies

Page 4: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

ContributionsContributions

Interactive tool for analysis of data warehouses with hierarchical structure• Based on Polaris*

• Rapid construction of table-based visualizations• Algebraic formalism• Analysis of flat relational databases

• To support hierarchies, we need to extend:• User interface• Algebraic formalism• Generation of data queries

* C. Stolte, D. Tang, and P. Hanrahan. Polaris: A System for Query, Analysis, and Visualization of Multi-dimensional Relational Databases. In IEEE Transactions on Visualization and Computer Graphics, January 2002.

Page 5: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

OutlineOutline

• Review of Polaris• Demo• Formalism

• Hierarchies and Data Cubes• Extensions to Polaris

• Demo• Formalism

• Discussion

Page 6: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Schema: Denormalized RelationSchema: Denormalized Relation

MarketStateYearQuarterMonthProduct TypeProduct

ProfitSalesPayrollMarketingInventoryMarginCOGS...

Ordinal fields(categorical)

Quantitative fields(metrics)

Hypothetical nation-widecoffee chain data

(courtesy Visual Insights)

Page 7: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Demo I: Original PolarisDemo I: Original Polaris

Page 8: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Polaris ReviewPolaris Review

Provide an interface for rapidly and incrementally generating table-based graphical displays

Users construct visualizations via a drag-and-drop interface

Queries are automatically generated

Interface is simple and expressive because built upon a formalism

Page 9: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Polaris FormalismPolaris Formalism

UI interpreted as visual specification that defines:• table configuration• type of graphic in each pane• encoding of data as visual properties of marks • data transformations

Specification automatically compiled into necessary queries & drawing commands

Page 10: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Polaris FormalismPolaris Formalism

UI interpreted as visual specification that defines:• table configuration• type of graphic in each pane• encoding of data as visual properties of marks • data transformations

Specification automatically compiled into necessary queries & drawing commands

Page 11: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Specifying Table ConfigurationsSpecifying Table Configurations

Interface: define table configuration by dropping fields on shelves

Formalism: shelf content interpreted as expressions in table algebra

Page 12: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Table AlgebraTable Algebra

Operands are the database fields• each operand interpreted as a set {…}• quantitative and ordinal fields interpreted

differently Three operators:

• concatenation (+), cross (X), nest (/)

Page 13: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Ordinal fields: interpret domain as a set that partitions table into rows and columns:

Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}

Table Algebra: OperandsTable Algebra: Operands

Quantitative fields: treat domain as single element set and encode spatially as axes: Profit = {(Profit[-410,650])}

Page 14: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Concatenation (Concatenation (+)+) operator operator

Ordered union of set interpretations:Quarter + ProductType

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} + {(Coffee), (Espresso)}

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)}

Profit + Sales = {(Profit[-310,620]),(Sales[0,1000])}

Page 15: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Cross (Cross (x)x) operator operator

Cross-product of set interpretations:Quarter x ProductType =

ProductType x Profit =

{(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

Page 16: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Nest (Nest (/)/) operator operator

Quarter x Month• would create entry twelve entries for

each quarter. i.e., (Qtr1, December)

Quarter / Month• would only create three entries per

quarter

• based on tuples in database not semantics

• can be expensive to compute

Page 17: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

OutlineOutline

• Review of Polaris• Demo• Formalism

• Hierarchies and Data Cubes• Extensions to Polaris

• Demo• Formalism

• Discussion

Page 18: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Data CubesData Cubes

Structure relation as n-dimensional cube

Each cell summarizes all measures for those dimension values

Each cube dimension corresponds to a dimension in the relation

Page 19: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Hierarchies and Data CubesHierarchies and Data Cubes Each dimension in the cube is structured as a tree

Each level in tree corresponds to level of detail Nodes correspond to domain values

Page 20: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Hierarchies and Data CubesHierarchies and Data Cubes

Some hierarchies known a priori• Provide semantic meaning• Time (day, month, year)

Location (city, state, country) Can be automatically generated

• Classification algorithms• Clustering

Enable analyst to reason at high level of abstraction then drill down• Interface must expose underlying

hierarchical structure

Page 21: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Hierarchy ModelHierarchy Model

Our model assumes that hierarchies:• Can be modeled using star or snowflake schema• Have uniform depth • Have homogenous node types

Other models relax these constraints

Chose to focus on model commonly found in commercial data warehouse and data cube products

Page 22: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

OutlineOutline

• Review of Polaris• Demo• Formalism

• Hierarchies and Data Cubes• Extensions to Polaris

• Demo • Formalism

• Discussion

Page 23: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Schema: Star SchemaSchema: Star Schema

StateMonthProductProfitSalesPayrollMarketingInventoryMarginCOGS...

Measures

LocationMarketState

TimeYearQuarterMonth

ProductsProduct TypeProduct Name

Fact tableDimension Table

Page 24: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Demo II: Revised PolarisDemo II: Revised Polaris

Page 25: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Extending the FormalismExtending the Formalism

Redefine operands as dimension levels and measures not simply database fields

Need to define set interpretation of a dimension level• Domain is not a single ordered list• Composed of node values at particular level in

hierarchy• Node values are uniquely defined by the path from

root node

Possible definitions?

Page 26: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Set Interpretation: Option 1Set Interpretation: Option 1

Define set interpretation by listing each node value with unique path to root:

{1998.Qtr1.Jan, …., 1998.Qtr4.Dec}

(+) Provides unique set interpretation

(-) Limits expressiveness• Any table including “Months” must include “Year”• Not possible to summarize across years

(e.g., Total Sales in January for all Years)• Not a standard projection of data cube but very

useful

Page 27: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Set Interpretation: Option 2Set Interpretation: Option 2

Define set interpretation by listing each node value without path to root:

{Jan, Feb, …., Dec}

Order by depth first traversal Consolidate non-unique values

This works—but how do we leverage known relationship between dimension levels?

Page 28: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Dot (.) OperatorDot (.) Operator

Nest isn’t aware of defined hierarchical relationships:• Year / Months might work—if all data present• Inefficient

New operator: Dot (.)• Nest computed using the dimension table rather

then the fact table

Sufficient to provide support for aggregation, drill down, and roll up in algebra.

Page 29: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Generating QueriesGenerating Queries

Queries generated from specification.

Panes correspond to either a slice of a projection or an aggregation of a projection.

Multiple queries required if level-of-detail varies.

Algebraic manipulation can be used to determine minimal set of queries.

Interpreter generates SQL, MDX, or Rivet queries.

Page 30: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Related Visualization ProjectsRelated Visualization Projects

Formalisms for Graphics• Wilkinson’s Grammar of Graphics • Bertin’s Semiology of Graphics• Mackinlay’s APT

Visual Exploration of Databases• VQE, DeVise, Visage, DataSplash/Tioga-2,

Visualization and Data Mining• MineSet, …

Page 31: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Data Mining and VisualizationData Mining and Visualization

Polaris not solely for visual analysis• Precursor to algorithmic analysis to identify areas of

interest• Validate results and establish trust and understanding• Incorporate decision trees and classification algorithms

into data warehouses as hierarchies

Page 32: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

SummarySummary

Extended Polaris to fully support and expose hierarchical structure of data cubes

Extended not only interface but underlying algebraic formalism

Page 33: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Future WorkFuture Work

Use underlying formalism as basis for other visualization tools• Interactive pan-and-zoom systems

Page 34: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Future WorkFuture Work

Visual presentation of metadata• Hierarchies are one example of rich,

domain specific metadata• As important to analysis as data itself• How to visualize this metadata?

Page 35: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Future WorkFuture Work

Interactive visualization Prefetching and Caching