6. multidimensional data (part 2: non-linear projections ... · pdf filevisualization and...

58
Visualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections and Interactive Analysis)

Upload: phamkien

Post on 11-Mar-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6. Multidimensional Data(Part 2: Non-linear Projections and Interactive

Analysis)

Page 2: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.1 Dimensionality Reduction: Non-linear Approaches

Page 3: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 202

Visualization and Computer Graphics LabJacobs University

Non-linear projections

• Linear projections cannot detect low-dimensional curved features (e.g., manifolds) in a high-dimensional space.

• Non-linear projections can be used to reproduce the nonlinear high-dimensional features in lower-dimensional spaces.

Page 4: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 203

Visualization and Computer Graphics LabJacobs University

PCA on curved manifolds

Page 5: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 204

Visualization and Computer Graphics LabJacobs University

Kernel PCA• Non-linearize PCA• Use a non-linear transformation Φ(x) • Kernel represents distances in the transformed space• Define the correlation matrix as

Page 6: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 205

Visualization and Computer Graphics LabJacobs University

Kernel PCA

• Example for Gaussian kernel:

Page 7: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 206

Visualization and Computer Graphics LabJacobs University

Locally Linear Embedding (LLE)

• Non-linear manifold method:

Page 8: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 207

Visualization and Computer Graphics LabJacobs University

Locally Linear Embedding (LLE)• Assuming locally linear representation, each point can

be described as linear combination of neighborhood:

• Compute weights using least squares:

• Maintain during projection:

Page 9: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 208

Visualization and Computer Graphics LabJacobs University

Locally Linear Embedding (LLE)

• Local distortions, but globally keeping manifold

Page 10: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 209

Visualization and Computer Graphics LabJacobs University

Isomap• Uses geodesic instead of Euclidean distance:

• This geodesic distance is approximated by calculating the neighbourhood of the each point, building a graph that connects the neighborhood with distances as edge weights, and computing distances on that graph (Dijkstra‘s shortest path algorithm).

• Then, perform an MDS with these geodesic distances.

Page 11: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 210

Visualization and Computer Graphics LabJacobs University

Isomap

Page 12: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 211

Visualization and Computer Graphics LabJacobs University

Multidimensional Scaling (MDS)

• Goal: Find a set of points in a lower-dimensional space whose pairwise distances match those measured in high-dimensional space (or, in general, distances given in form of a distance matrix).

Page 13: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 212

Visualization and Computer Graphics LabJacobs University

MDS

• Metric MDS tries to minimize the difference betweendistances in the original space and the projected space:

• This serves as an objective function that needs to be minimized.

• Nonmetric MDS introduces a monotonous function f to define the objective function

E d k l d k lM k l= − ′∑ ≠ [ ( , ) ( , )]2

Ed k l

f d k l d k lNk l

k l=′∑

− ′∑≠

12

2

[ ( , )][ ( ( , )) ( , )]

Page 14: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 213

Visualization and Computer Graphics LabJacobs University

Classical MDS vs. PCA

• Classical MDS uses Euclidean distance.• Classical MDS is producing same result as PCA.• PCA uses covariance matrix C = (1/N) XTX, while MDS

uses Gram matrix G = XXT to compute largest eigenvectors that represent the axes of the projected space.

Page 15: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 214

Visualization and Computer Graphics LabJacobs University

Self-Organizing Maps (SOM)

• Algorithm that performs clustering and non-linear projection onto lower dimension at the same time

• Finds and orders a set of reference vectors located on a discrete lattice

• Learning rule:

• Objective function uses a neighborhood kernel:m t m t h t x t m ti i ci i( ) ( ) ( )[ ( ) ( )]+ = + −1

E h x mSOM ci k iik= −∑∑

Page 16: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 215

Visualization and Computer Graphics LabJacobs University

MDS vs. SOM

• MDS tries to preserve the metric (ordering relations) of the original space, long distances dominate over the shorter ones

• SOM tries to preserve the topology (localneighbourhood relations), items projected to nearby locations are similar

Page 17: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 216

Visualization and Computer Graphics LabJacobs University

PCA vs SOM

Page 18: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 217

Visualization and Computer Graphics LabJacobs University

Example: PCA vs SOM

Page 19: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 218

Visualization and Computer Graphics LabJacobs University

Design goals

• Distance preservation• Neighborhood preservation• Cluster preservation• Cluster segregation• Clutter avoidance (using screen space)

Page 20: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 219

Visualization and Computer Graphics LabJacobs University

Quality measuresDistance preservation:• Stress• Correlation coefficient • Distance plots

Neighborhood preservation:• Neighborhood hit

Cluster preservation and segregation:• Silhouette coefficient

Page 21: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.2 Star Coordinates

Page 22: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 221

Visualization and Computer Graphics LabJacobs University

Motivation

• Looking at all configurations in a scatterplot matrix is a tedious process.

• Is there no way that we can depict d-dimensional points in a 2D or 3D visual system?

Page 23: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 222

Visualization and Computer Graphics LabJacobs University

Star coordinates

• The main idea of a star coordinate system is to place d-dimensional point in a d-dimensional coordinate system that is drawn in a 2D visual space.

• Hence, the d axes of the star coordinate system are linearly dependent.

• The axes emerge from an origin o and have distinct directions .

• Then, a d-dimensional point pi

is mapped to

Page 24: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 223

Visualization and Computer Graphics LabJacobs University

Star coordinates• Placing a d-dimensional point in star coordinates:

Page 25: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 224

Visualization and Computer Graphics LabJacobs University

Star coordinate layout

• The layout of the axes can be interactively changed or automatically generated.

Page 26: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 225

Visualization and Computer Graphics LabJacobs University

Star coordinates versus projections

• Star coordinate plots represent a linear projection.• Interacting with the axes of the plot changes the

projection.

Page 27: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 226

Visualization and Computer Graphics LabJacobs University

3D star coordinates• The concept can be generalized to a 3D visual system

(points are clustered and cluster boundaries rendered):

Page 28: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.3 Star Glyphs

Page 29: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 228

Visualization and Computer Graphics LabJacobs University

Motivation

• The mapping to star coordinates is not a one-to-one mapping.

• Hence, it is ambiguous.• Different points are mapped to the same position.• Star glyphs (also referred to as Kiviat diagrams)

replace the point rendering with a polygon rendering.

Page 30: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 229

Visualization and Computer Graphics LabJacobs University

Star glyphs

• For each sample, one draws a polygon that connects points representing the values of the attributes for each axis in the given order.

• Hence a point pi is mapped to a polygon:

Page 31: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 230

Visualization and Computer Graphics LabJacobs University

Star glyphs

• One sample vs. many samples:

Page 32: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 231

Visualization and Computer Graphics LabJacobs University

3D star glyphs

Page 33: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.4 Parallel Coordinates

Page 34: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 233

Visualization and Computer Graphics LabJacobs University

Parallel coordinates• Parallel coordinates layout are similar to star glyphs,

but they do not have an origin.• Theoretically, the origin is placed at –infinity.• Hence, all d coordinate axes are parallel.• For each sample, one draws a polygonal line that

connects the points of each axis that represents the attribute’s value of the respective sample.

Page 35: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 234

Visualization and Computer Graphics LabJacobs University

Parallel coordinates

Page 36: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 235

Visualization and Computer Graphics LabJacobs University

Duality

• Line-point duality in parallel and Cartesian coordinates:

Page 37: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 236

Visualization and Computer Graphics LabJacobs University

Duality

• Line-point duality in parallel and Cartesian coordinates:

Page 38: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 237

Visualization and Computer Graphics LabJacobs University

Examples

• Few vs. many samples:

Page 39: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 238

Visualization and Computer Graphics LabJacobs University

Clutter

• For many samples, parallel coordinate plots may become cluttered.

• One way of reducing clutter is by interactive selection on the individual axes.

• Another way is to cluster the samples and to draw clusters.

Page 40: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 239

Visualization and Computer Graphics LabJacobs University

Interactive selection

Page 41: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 240

Visualization and Computer Graphics LabJacobs University

Interactive selection

Page 42: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 241

Visualization and Computer Graphics LabJacobs University

Interactive selection

Page 43: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 242

Visualization and Computer Graphics LabJacobs University

Clustering

Page 44: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 243

Visualization and Computer Graphics LabJacobs University

Scatterplot matrix vs. Parallel Coordinates

Page 45: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 244

Visualization and Computer Graphics LabJacobs University

Dimension reordering

• Star glyphs and parallel coordinates represent the values of all attributes.

• However, the order is still important.• Only for neighboring axes, correlations become

obvious.

Page 46: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 245

Visualization and Computer Graphics LabJacobs University

Dimension reordering

Page 47: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.5 Summary on Visual Analytics of Multi-dimensional Data

Page 48: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 247

Visualization and Computer Graphics LabJacobs University

Multidimensional data visualization

• Multidimensional data can be displayed by rendering sets of points in projected systems as in– scatterplot matrices,– scatterplot in projected spaces, or– star coordinates

or by rendering polygons that intersect all axes as in– parallel coordinates or– star glyphs.

Page 49: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 248

Visualization and Computer Graphics LabJacobs University

Scalability in dimensions

• Sorted from worst to best:– scatterplot matrices– star coordinates– star glyphs– parallel coordinates

• Order is an issue.

Page 50: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 249

Visualization and Computer Graphics LabJacobs University

Scalability in number of samples

• Point rendering work better than line renderings.• Clustering can help.• Interaction can help.

Page 51: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.6 Including non-quantitative attributes

Page 52: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 251

Visualization and Computer Graphics LabJacobs University

Lattice plots

Page 53: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 252

Visualization and Computer Graphics LabJacobs University

Small Multiples

Page 54: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 253

Visualization and Computer Graphics LabJacobs University

Small Multiples

Page 55: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 254

Visualization and Computer Graphics LabJacobs University

Small Multiples

Page 56: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 255

Visualization and Computer Graphics LabJacobs University

Heatmap

Page 57: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Visualization and Computer Graphics LabJacobs University

6.7 Assignment

Page 58: 6. Multidimensional Data (Part 2: Non-linear Projections ... · PDF fileVisualization and Computer Graphics Lab Jacobs University 6. Multidimensional Data (Part 2: Non-linear Projections

Data Analytics 257

Visualization and Computer Graphics LabJacobs University

Assignment 4Consider, once more, the six continuous numerical attributes of the Automobile Data Set as in Assignments 2 and 3.1. Apply a spherical scaling to the dimensions and create a SPLOM.2. Apply a PCA approach (with spherical scaling) to the 6D space.3. Determine the intrinsic dimensionality.4. Plot the projection to the 2D space spanned by two of the

principal components. Allow for a reconfiguration by exchanging interactively, which components are being exchanged. Analyze the projected spaces visually. Do you see any clusters? Which? Allow for selecting clusters/outliers/groups and highlighting them in the SPLOM to analyze them further.

5. Apply a kernel PCA with a suitable kernel function. Perform a similar analysis as in 4. Do you get any new insights? Try different kernel settings.