6. multidimensional data (part 2: non-linear projections ... · pdf filevisualization and...
TRANSCRIPT
Visualization and Computer Graphics LabJacobs University
6. Multidimensional Data(Part 2: Non-linear Projections and Interactive
Analysis)
Visualization and Computer Graphics LabJacobs University
6.1 Dimensionality Reduction: Non-linear Approaches
Data Analytics 202
Visualization and Computer Graphics LabJacobs University
Non-linear projections
• Linear projections cannot detect low-dimensional curved features (e.g., manifolds) in a high-dimensional space.
• Non-linear projections can be used to reproduce the nonlinear high-dimensional features in lower-dimensional spaces.
Data Analytics 203
Visualization and Computer Graphics LabJacobs University
PCA on curved manifolds
Data Analytics 204
Visualization and Computer Graphics LabJacobs University
Kernel PCA• Non-linearize PCA• Use a non-linear transformation Φ(x) • Kernel represents distances in the transformed space• Define the correlation matrix as
Data Analytics 205
Visualization and Computer Graphics LabJacobs University
Kernel PCA
• Example for Gaussian kernel:
Data Analytics 206
Visualization and Computer Graphics LabJacobs University
Locally Linear Embedding (LLE)
• Non-linear manifold method:
Data Analytics 207
Visualization and Computer Graphics LabJacobs University
Locally Linear Embedding (LLE)• Assuming locally linear representation, each point can
be described as linear combination of neighborhood:
• Compute weights using least squares:
• Maintain during projection:
Data Analytics 208
Visualization and Computer Graphics LabJacobs University
Locally Linear Embedding (LLE)
• Local distortions, but globally keeping manifold
Data Analytics 209
Visualization and Computer Graphics LabJacobs University
Isomap• Uses geodesic instead of Euclidean distance:
• This geodesic distance is approximated by calculating the neighbourhood of the each point, building a graph that connects the neighborhood with distances as edge weights, and computing distances on that graph (Dijkstra‘s shortest path algorithm).
• Then, perform an MDS with these geodesic distances.
Data Analytics 210
Visualization and Computer Graphics LabJacobs University
Isomap
Data Analytics 211
Visualization and Computer Graphics LabJacobs University
Multidimensional Scaling (MDS)
• Goal: Find a set of points in a lower-dimensional space whose pairwise distances match those measured in high-dimensional space (or, in general, distances given in form of a distance matrix).
Data Analytics 212
Visualization and Computer Graphics LabJacobs University
MDS
• Metric MDS tries to minimize the difference betweendistances in the original space and the projected space:
• This serves as an objective function that needs to be minimized.
• Nonmetric MDS introduces a monotonous function f to define the objective function
E d k l d k lM k l= − ′∑ ≠ [ ( , ) ( , )]2
Ed k l
f d k l d k lNk l
k l=′∑
− ′∑≠
≠
12
2
[ ( , )][ ( ( , )) ( , )]
Data Analytics 213
Visualization and Computer Graphics LabJacobs University
Classical MDS vs. PCA
• Classical MDS uses Euclidean distance.• Classical MDS is producing same result as PCA.• PCA uses covariance matrix C = (1/N) XTX, while MDS
uses Gram matrix G = XXT to compute largest eigenvectors that represent the axes of the projected space.
Data Analytics 214
Visualization and Computer Graphics LabJacobs University
Self-Organizing Maps (SOM)
• Algorithm that performs clustering and non-linear projection onto lower dimension at the same time
• Finds and orders a set of reference vectors located on a discrete lattice
• Learning rule:
• Objective function uses a neighborhood kernel:m t m t h t x t m ti i ci i( ) ( ) ( )[ ( ) ( )]+ = + −1
E h x mSOM ci k iik= −∑∑
Data Analytics 215
Visualization and Computer Graphics LabJacobs University
MDS vs. SOM
• MDS tries to preserve the metric (ordering relations) of the original space, long distances dominate over the shorter ones
• SOM tries to preserve the topology (localneighbourhood relations), items projected to nearby locations are similar
Data Analytics 216
Visualization and Computer Graphics LabJacobs University
PCA vs SOM
Data Analytics 217
Visualization and Computer Graphics LabJacobs University
Example: PCA vs SOM
Data Analytics 218
Visualization and Computer Graphics LabJacobs University
Design goals
• Distance preservation• Neighborhood preservation• Cluster preservation• Cluster segregation• Clutter avoidance (using screen space)
Data Analytics 219
Visualization and Computer Graphics LabJacobs University
Quality measuresDistance preservation:• Stress• Correlation coefficient • Distance plots
Neighborhood preservation:• Neighborhood hit
Cluster preservation and segregation:• Silhouette coefficient
Visualization and Computer Graphics LabJacobs University
6.2 Star Coordinates
Data Analytics 221
Visualization and Computer Graphics LabJacobs University
Motivation
• Looking at all configurations in a scatterplot matrix is a tedious process.
• Is there no way that we can depict d-dimensional points in a 2D or 3D visual system?
Data Analytics 222
Visualization and Computer Graphics LabJacobs University
Star coordinates
• The main idea of a star coordinate system is to place d-dimensional point in a d-dimensional coordinate system that is drawn in a 2D visual space.
• Hence, the d axes of the star coordinate system are linearly dependent.
• The axes emerge from an origin o and have distinct directions .
• Then, a d-dimensional point pi
is mapped to
Data Analytics 223
Visualization and Computer Graphics LabJacobs University
Star coordinates• Placing a d-dimensional point in star coordinates:
Data Analytics 224
Visualization and Computer Graphics LabJacobs University
Star coordinate layout
• The layout of the axes can be interactively changed or automatically generated.
Data Analytics 225
Visualization and Computer Graphics LabJacobs University
Star coordinates versus projections
• Star coordinate plots represent a linear projection.• Interacting with the axes of the plot changes the
projection.
Data Analytics 226
Visualization and Computer Graphics LabJacobs University
3D star coordinates• The concept can be generalized to a 3D visual system
(points are clustered and cluster boundaries rendered):
Visualization and Computer Graphics LabJacobs University
6.3 Star Glyphs
Data Analytics 228
Visualization and Computer Graphics LabJacobs University
Motivation
• The mapping to star coordinates is not a one-to-one mapping.
• Hence, it is ambiguous.• Different points are mapped to the same position.• Star glyphs (also referred to as Kiviat diagrams)
replace the point rendering with a polygon rendering.
Data Analytics 229
Visualization and Computer Graphics LabJacobs University
Star glyphs
• For each sample, one draws a polygon that connects points representing the values of the attributes for each axis in the given order.
• Hence a point pi is mapped to a polygon:
Data Analytics 230
Visualization and Computer Graphics LabJacobs University
Star glyphs
• One sample vs. many samples:
Data Analytics 231
Visualization and Computer Graphics LabJacobs University
3D star glyphs
Visualization and Computer Graphics LabJacobs University
6.4 Parallel Coordinates
Data Analytics 233
Visualization and Computer Graphics LabJacobs University
Parallel coordinates• Parallel coordinates layout are similar to star glyphs,
but they do not have an origin.• Theoretically, the origin is placed at –infinity.• Hence, all d coordinate axes are parallel.• For each sample, one draws a polygonal line that
connects the points of each axis that represents the attribute’s value of the respective sample.
Data Analytics 234
Visualization and Computer Graphics LabJacobs University
Parallel coordinates
Data Analytics 235
Visualization and Computer Graphics LabJacobs University
Duality
• Line-point duality in parallel and Cartesian coordinates:
Data Analytics 236
Visualization and Computer Graphics LabJacobs University
Duality
• Line-point duality in parallel and Cartesian coordinates:
Data Analytics 237
Visualization and Computer Graphics LabJacobs University
Examples
• Few vs. many samples:
Data Analytics 238
Visualization and Computer Graphics LabJacobs University
Clutter
• For many samples, parallel coordinate plots may become cluttered.
• One way of reducing clutter is by interactive selection on the individual axes.
• Another way is to cluster the samples and to draw clusters.
Data Analytics 239
Visualization and Computer Graphics LabJacobs University
Interactive selection
Data Analytics 240
Visualization and Computer Graphics LabJacobs University
Interactive selection
Data Analytics 241
Visualization and Computer Graphics LabJacobs University
Interactive selection
Data Analytics 242
Visualization and Computer Graphics LabJacobs University
Clustering
Data Analytics 243
Visualization and Computer Graphics LabJacobs University
Scatterplot matrix vs. Parallel Coordinates
Data Analytics 244
Visualization and Computer Graphics LabJacobs University
Dimension reordering
• Star glyphs and parallel coordinates represent the values of all attributes.
• However, the order is still important.• Only for neighboring axes, correlations become
obvious.
Data Analytics 245
Visualization and Computer Graphics LabJacobs University
Dimension reordering
Visualization and Computer Graphics LabJacobs University
6.5 Summary on Visual Analytics of Multi-dimensional Data
Data Analytics 247
Visualization and Computer Graphics LabJacobs University
Multidimensional data visualization
• Multidimensional data can be displayed by rendering sets of points in projected systems as in– scatterplot matrices,– scatterplot in projected spaces, or– star coordinates
or by rendering polygons that intersect all axes as in– parallel coordinates or– star glyphs.
Data Analytics 248
Visualization and Computer Graphics LabJacobs University
Scalability in dimensions
• Sorted from worst to best:– scatterplot matrices– star coordinates– star glyphs– parallel coordinates
• Order is an issue.
Data Analytics 249
Visualization and Computer Graphics LabJacobs University
Scalability in number of samples
• Point rendering work better than line renderings.• Clustering can help.• Interaction can help.
Visualization and Computer Graphics LabJacobs University
6.6 Including non-quantitative attributes
Data Analytics 251
Visualization and Computer Graphics LabJacobs University
Lattice plots
Data Analytics 252
Visualization and Computer Graphics LabJacobs University
Small Multiples
Data Analytics 253
Visualization and Computer Graphics LabJacobs University
Small Multiples
Data Analytics 254
Visualization and Computer Graphics LabJacobs University
Small Multiples
Data Analytics 255
Visualization and Computer Graphics LabJacobs University
Heatmap
Visualization and Computer Graphics LabJacobs University
6.7 Assignment
Data Analytics 257
Visualization and Computer Graphics LabJacobs University
Assignment 4Consider, once more, the six continuous numerical attributes of the Automobile Data Set as in Assignments 2 and 3.1. Apply a spherical scaling to the dimensions and create a SPLOM.2. Apply a PCA approach (with spherical scaling) to the 6D space.3. Determine the intrinsic dimensionality.4. Plot the projection to the 2D space spanned by two of the
principal components. Allow for a reconfiguration by exchanging interactively, which components are being exchanged. Analyze the projected spaces visually. Do you see any clusters? Which? Allow for selecting clusters/outliers/groups and highlighting them in the SPLOM to analyze them further.
5. Apply a kernel PCA with a suitable kernel function. Perform a similar analysis as in 4. Do you get any new insights? Try different kernel settings.