dimensionality reduction from several...

81
Dimensionality Reduction From Several Angles Tamara Munzner Department of Computer Science University of British Columbia FODAVA Distinguished Lecture Georgia Tech 7 Dec 2012 http://www.cs.ubc.ca/~tmm/talks.html#fodava12

Upload: others

Post on 21-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Dimensionality Reduction From Several AnglesTamara MunznerDepartment of Computer ScienceUniversity of British Columbia

FODAVA Distinguished Lecture Georgia Tech 7 Dec 2012

http://www.cs.ubc.ca/~tmm/talks.html#fodava12

Page 2: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Dimensionality Reduction

• what is it?–map data from high-dimensional measured space into low-

dimensional target space

• when to use it?–when you can’t directly measure what you care about

• true dimensionality of dataset conjectured to be smaller than dimensionality of measurements

• latent factors, hidden variables

2

Page 3: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

DR Example

Tumor Measurement Data

DR

Malignant Benign

9 Dimensional Measured Space

3

2 Dimensional Target Space

Page 4: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Dimensionality Reduction

• why do people do DR?–improve performance of downstream algorithm

• avoid curse of dimensionality

–data analysis• if look at the output: visual data analysis

4

Page 5: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Angles of Attack

• design algorithms• design systems• design tools to solve real-world user problems• evaluate/validate all of these• create taxonomies to characterize existing things

• benefits of multiple angles–parallax view of what’s important–outcomes cross-pollinate

5

Page 6: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Questions: A Progression• can we design DR algorithms/techniques that are better than

previous ones?• can we build a DR system that real people use?• when do people need to look at DR output?

– how can we figure out what people need?

• how should people look at DR output?– how can we tell if we’re drawing the right picture?

– do metrics match up with human perception?

• why and how do people use DR?

6

Page 7: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Even More Questions

• open questions–how are real people actually using DR tools/techniques?

• does it match up with what we think/hope/assert/assume?

–why are they using it?• what are their goals and tasks, at abstract level?

–is it working?• how do their goals match up with implicit assumptions behind different benchmarks?

• do current state of the art tools meet their needs?

7

Page 8: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Dimensionality Reduction In the WildTasks and Challengesjoint work with:Michael Sedlmair, Matthew Brehmer, Stephen Ingram

8

work in progress

Page 9: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Two-Year Cross-Domain Qualitative Study

• in the wild–HCI term for work in the field with real users

• vs controlled lab setting

• interviewed two dozen high-dim data analysts–across over a dozen domains and past several years

• final results coming soon–taxonomy of abstract tasks for DR–identified significant unmet user needs

• why and how do people use DR?–overarching question weaving through projects in this talk–preliminary results from study informed many of them

9

Page 10: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Questions and Answers

• can we design DR algorithms/techniques that are better than previous ones?

• can we build a DR system that real people use?• when do people need to look at DR output?• how should people look at DR output?• why and how do people use DR?

• so... how do we answer these questions? –many validation methods to choose from!

10

Page 11: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

A Nested Modelof Visualization Design and Validation

A Nested Model of Visualization Design and Validation.Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009).

11

http://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/

Page 12: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

12

Four Levels of Design and Validation

• four levels of design problems–different threats to validity at each level

problem characterization: you misunderstood their needs

data/task abstraction: you’re showing them the wrong thing

visual encoding / interaction techniques: the way you show it doesn’t work

algorithm: your code is too slow

Page 13: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Matching Validation With Design Level

13

threat: wrong problem validate: observe and interview target users threat: bad data/operation abstraction threat: ineffective encoding/interaction technique validate: justify encoding/interaction design threat: slow algorithm

validate: analyze computational complexity implement system validate: measure system time/memory validate: qualitative/quantitative result image analysis [test on any users, informal usability study] validate: lab study, measure human time/errors for operation validate: test on target users, collect anecdotal evidence of utility validate: field study, document human usage of deployed system validate: observe adoption rates

Page 14: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Where Do We Go From Here?

• no single paper includes all methods of validation–pick methods based on angle of attack

• in this talk–cover many different methods and kinds of questions they

can help with answering

14

Page 15: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Outline

• can we design better DR algorithms?• can we build a DR system for real people?• how should we show people DR results?• when do people need to use DR?

15

Page 16: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Outline

• can we design better DR algorithms?–algorithm for GPU MDS: Glimmer–algorithm for MDS with costly distances: Glint

• can we build a DR system for real people?• how should we show people DR results?• when do people need to use DR?

16

Page 17: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

GlimmerMultilevel MDS on the GPU

Glimmer: Multilevel MDS on the GPU.Ingram, Munzner, Olano. IEEE TVCG 15(2):249-261, 2009.

joint work with:Stephen Ingram, Marc Olano

17

http://www.cs.ubc.ca/labs/imager/tr/2008/glimmer/

Page 18: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

18

MDS: Multidimensional Scaling

• entire family of methods, linear and nonlinear• classical scaling: minimize strain

–Nystrom/spectral methods: O(N)• Landmark MDS [de Silva 2004], PivotMDS [Brandes & Pich 2006]

–limitations: quality for very high dimensional sparse data

• distance scaling: minimize stress–nonlinear optimization: O(N2)

• SMACOF [de Leeuw 1977]

–force-directed placement: O(N2)• Stochastic Force [Chalmers 1996]• limitations: quality problems from local minima

• Glimmer goal: O(N) speed and high quality

Page 19: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Glimmer Strategy

• Stochastic force alg suitable for fast GPU port–but systematic testing shows it often terminates too soon

• Use as subsystem within new multilevel GPU alg with much better convergence properties

19

0.0

0.0

0.0

0.1

100 10075 20050 30025 40000Nor

mal

ized

Str

ess

(Log

)

Grid Cardinality

Page 20: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

20

Sparse Dataset (docs): N=D=28K

0

5.0000

10.0000

15.0000

20.0000

0 7500 15000 22500 30000

Tim

e (s

)

Cardinality

Glimmer Pivot MDS

0.1000

1.0000

0 2750 5500 8250 11000Nor

mal

ized

Str

ess

(Log

)

Cardinality

Glimmer

Pivot MDS

GlimmerPivot MDS

–quality higher–speed equivalent

Page 21: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–quantitative algorithm benchmarks: speed, quality

• systematic comparison across 1K-10K instances vs a few spot checks

–qualitative judgements of layout quality

• outcomes–characterized kinds of datasets where technique yields

quality improvements

• then what?–saw what real users could do with it after release

• identified limitations

21

Page 22: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

GlintAn MDS Framework for Costly Distance Functions

Glint: An MDS Framework for Costly Distance Functions.Ingram, Munzner. Proc. SIGRAD 2012.

joint work with:Stephen Ingram

22

http://www.cs.ubc.ca/labs/imager/tr/2012/Glint/

Page 23: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

MDS Algorithm Speeds

• newer algorithms linear, but...

23

Algorithm Author/Year Complexity

Classic MDS Torgersen ‘52 O(N3)

SMACOF de Leeuw ‘77 O(N3)

Pivot MDS Brandes ‘07 O(kN)

Glimmer Ingram ‘09 O(cN)

LAMP Joia ‘11 O(kN)

Age

Page 24: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

MDS Speed on Coordinate Data

• time to calculate distance between two points–0.00001 second

24

Classic MDS SMACOF GlimmerPivot MDS LAMP

shuttle benchmarkN = 43K

D = 9

Hoursto Compute

1 Secondto Compute

Page 25: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

MDS Speed on Distance Matrix Data

• time to calculate distance between two points–0.01 second

25

flickr benchmarkN = 1925d = EMD

Classic MDS SMACOF GlimmerPivot MDS LAMP

Hours >1 hourmanual

Hours Hours

Page 26: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

MDS Input: Coordinates vs Distances

• some systems intrinsically require coordinates–fundamental to LAMP speedup approach

• some handle both–including Glimmer

26

High DimensionalGeometry

MDS Low DimensionalGeometry

CoordinateSpace

DistanceMatrix

CoordinateSpace

Page 27: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Costly Distances

• DR in the Wild revealed many real-world examples

27

Distance function Cost (seconds)

Euclidean on 9-D data 0.00001

Database Query 0.001

Earth Mover Distance 0.01

Euclidean on 4M-D data 1.0

Human-in-the-loop 10.0

Cheap

Costly

Page 28: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Glint Framework

• calculate as few distances as possible, maintain quality• three-stage architecture

28

Page 29: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Glint Instantiations

• framework accommodates broad spectrum of algorithm types–three instantiations provided

29

MDS Algorithm Type Chosen Algorithm

Gradient-based Optimization SMACOF

Spectral/Analytic Pivot MDS

Force-Directed Glimmer

Page 30: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Force-Directed Instantiation Results

30

Cheap dist

Costlydists

Speed Quality

Glimmer Original

Glimmer w/ Glint

major speed improvements while quality maintained

Page 31: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–algorithm benchmarks

• outcomes–dataset characterization different from previous work

motivated by needs of real-world users–characterized distance metrics where architecture yields

speed improvements

• then what?–keep talking to real users as way to discover more unmet

needs

31

Page 32: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Outline

• can we design better DR algorithms?

–next: how do we get people to use DR properly?–move emphasis from solo algorithms to entire system

• can we build a DR system for real people?–system that provides guidance: DimStiller

• when do people need to use DR?• how should we show people DR results?• why and how do people use DR?

32

Page 33: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

DimStillerWorkflows for Dimensional Analysis and Reduction

DimStiller: Workflows for dimensional analysis and reduction.Ingram, Munzner, Irvine, Tory, Bergner, Moeller. Proc. VAST 2010, p 3-10.

joint work with:Stephen Ingram, Veronika Irvine, Melanie Tory, Steven Bergner, Torsten Möller

33

http://www.cs.ubc.ca/labs/imager/tr/2010/DimStiller/

Page 34: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

• DR in the Wild revealed broad set of users

34

Math / Stats

Data Knowledge

Page 35: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

35

Math / Stats

Data Knowledge

What’s a mean?

Took Stats in Undergrad

Best Paper at NIPS

Page 36: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

36

Math / Stats

Data Knowledge

Dropped in lap

Total Information Awareness

Page 37: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

37

Math / Stats

Data Knowledge

Pedagogical

Page 38: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

38

Math / Stats

Data Knowledge

Don’t Need Analysis

Page 39: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

39

Math / Stats

Data Knowledge

Well Defined Tasks

Page 40: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Who Might Use DR?

• middle ground users benefit from guidance

40

Math / Stats

Data Knowledge

Well Defined Tasks

Middle Ground Users

Page 41: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Global Guidance

41

Sloppy,Misunderstood

Compact,Evocative

Operator Space

Page 42: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Global Guidance

42

Sloppy,Misunderstood

Compact,Evocative

http://www.cs.cornell.edu/courses/cs322/2008sp/schedule.html

http://www.statmethods.net/advgraphs/images/corrgram3.png

http://en.wikibooks.org/wiki/File:Scree_plot_for_the_initial_dataset_Figure_36.jpg

http://www.scielo.cl/scielo.php?pid=S0716-078X2001000200019&script=sci_arttext

PCA

Correlation

MDS

Variance

Filter

http://www.iconfinder.com/icondetails/44818/400/data_filter_icon?r=1

http://www.personality-project.org/R/

SPLOM

Operator Space

Page 43: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Global Guidance

• which operations and in which order?

43

Sloppy,Misunderstood

Compact,Evocative

http://www.cs.cornell.edu/courses/cs322/2008sp/schedule.html

http://www.statmethods.net/advgraphs/images/corrgram3.png

http://en.wikibooks.org/wiki/File:Scree_plot_for_the_initial_dataset_Figure_36.jpg

http://www.scielo.cl/scielo.php?pid=S0716-078X2001000200019&script=sci_arttext

PCA

Correlation

MDS

Variance

Filter

http://www.iconfinder.com/icondetails/44818/400/data_filter_icon?r=1

http://www.personality-project.org/R/

SPLOM

Operator Space

Page 44: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Local Guidance

• what to do with a given operator?

44

Sloppy,Misunderstood

PCA

Correlation

MDS

Variance

Filter

SPLOM

Operator Space

Compact,Evocative

PCA

How many principal components?

What do they mean?

http://www.cs.cornell.edu/courses/cs322/2008sp/schedule.html

http://www.statmethods.net/advgraphs/images/corrgram3.png

http://en.wikibooks.org/wiki/File:Scree_plot_for_the_initial_dataset_Figure_36.jpg

http://www.scielo.cl/scielo.php?pid=S0716-078X2001000200019&script=sci_arttext

Page 45: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

45

Scree Plot for Local Guidance

DimStiller

• pre-built workflows

• sequence of operators

• local guidance for each operator

– example: estimate true dimensionality with scree plot

Page 46: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–usage scenarios: workflows

• identified several (preliminary DRITW results)• built system to accommodate new ones as they’re uncovered

• outcomes–prototype system: “DR for the rest of us”

• then what?–who else needs guidance? not just end users!

46

Page 47: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Outline

• can we design better DR algorithms/techniques?• can we build a DR system for real people?

–next: more guidance about visual encoding

• how should we show people DR results?–visual encoding guidance for system developers:

Points vs Landscapes–visual encoding guidance for metric developers wrt human

perception: Visual Cluster Separation Factors

• when do people need to use DR?47

Page 48: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Spatialization DesignComparing Points and Landscapes

Spatialization Design: Comparing Points and Landscapes.Tory, Sprague, Wu, So, and Munzner.

IEEE TVCG 13(6):1262--1269, 2007 (Proc. InfoVis 07).

joint work with:Melanie Tory, David W. Sprague, Fuqu Wu, Wing Yan So

48

http://webhome.cs.uvic.ca/~mtory/publications/infovis2007.pdf

Page 49: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

49

Information Landscapes

• 2D or 3D landscape from set of DR points–height based on density

• oddly popular choice in DR–despite known occlusion/distortion problems with 3D–assertions: pattern recognition, spatial reasoning, familiar

Themescape: [http://www.k-n-o-r-z.de/publ/example/retriev1.htm]

[Guide to MicroPatent Aureka 9 ThemeScape]

Page 50: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

50

Understanding User Task

• abstract: search involving spatial areas and estimation

• domain-specific examples

• non-trivial complexity yet fast response time• frequent subtask in pilot test of real data analysis

Estimate which grid cell has the most points of the target color

“Where in the display are people with high incomes?”“Does this area also have high education levels?”“Does this area correspond to a particular work sector?”

Page 51: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

51

Lab Study: Test Human Response Time and Error

• hypotheses–points are better than landscapes

• result: yes! • much better: 2-4 × faster, 5-14 × more accurate

–2D landscapes (color only) better than 3D landscapes (color + height redundantly encoded)• result: yes• significantly faster, no significant difference in accuracy

Points 2D Landscape 3D Landscape

Page 52: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–lab study: controlled experiment

• outcomes–prescriptive advice at visual encoding level

• avoid 3D landscapes

• then what?–yet more guidance from user studies? not so fast...

52

Page 53: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

A Taxonomy of

Visual Cluster Separation Factors

A Taxonomy of Visual Cluster Separation Factors.Sedlmair, Tatu, Munzner, Tory. Computer Graphics Forum 31(3):1335-1344, 2012 (Proc. EuroVis 2012).

joint work with:Michael Sedlmair, Andrada Tatu, Melanie Tory

53

http://www.cs.ubc.ca/labs/imager/tr/2012/VisClusterSep/

Page 54: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Cluster Separation

• simple idea

54

Variance

Influence

curvy

round

Isot

ropy

Curvature

Centroid

Size

small large

Countfew many

Densitysparse dense

Variance of Count similar different

Variance ofDensity similar different

Variance of Size similar different

Outlier

Sca

leP

oint

Dis

tanc

eP

ositi

on

Class/Point Count

few classesmany points

many classesfew points

Mixtureinterwovenequidistantrandom

Splitcontiguous split

Within-Class Factors Between-Class Factors

Clumpiness

clumpyone

dense spotuniformly random

many dense spotsequidistant

non-existent existentInner-Outer Position

distantadjacentpartial

overlap separatefull

overlapClass Separation

Variance of Shape

similar different

Sha

pe

manynone

misleadingevocative

narrowShape

VS. VS.

Page 55: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Visual Cluster Separation Measures

55

• Many cluster separation measures proposed for semi-automatic guidance in high-dim data analysis

Sips et al.: Selecting good views of high-dimensional data using class consistency [EuroVis 2009]

Tatu et al.: Combining automated analysis and visualization techniques for effective exploration of high-dimensional data [VAST 2009]

Good!

Page 56: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Visual Cluster Separation Measures

• goal: number captures whether human looking at layout sees something interesting–after computation is done, not to refine clustering

• measures checked with user studies

• but our attempt to use for guidance showed problems

56

Tatu et al.: Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data [AVI 2010]

Good!

No!

Page 57: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

User vs. Data Study

• user study–previous work on validating cluster

measures–many users, few datasets–missing: dataset variety

• data study–few users, many datasets

57

hundreds two

two

Page 58: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

816 Dataset Instances

• 75 datasets–31 real, 44 synthetic–pre-classified

• 4 DR methods–PCA–Robust PCA–Glimmer MDS–t-SNE

• 3 visual encoding methods–2D scatterplots, 3D scatterplots,

2D SPLOMs–color-coded by class

58

Page 59: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Centroid Measure

59

Centroid: 93

Good!

Bad!

Page 60: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Analysis Approach

• qualitative method out of social science: coding–open coding: gradually build/refine code set–axial coding: relationships between categories

• evaluating the measures–metric aligns with human judgement?–if not: what are the reasons?

60

Charmaz, K. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. 2006.

Furniss, D., Blandford, A., Curzon, P. and Mary, Q. (2011). Confessions from a grounded theory PhD: experiences and lessons learnt. Proc. ACM CHI 2011, p 113-122.

Page 61: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Qualitative Analysis I: Cluster Separation Factors

61

splitoutlier equidistantpoints

shape

Page 62: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Analysis Approach

• qualitative method out of social science: coding–open coding: gradually build/refine code set–axial coding: relationships between categories

• evaluating the measures–metric aligns with human judgement?–if not: what are the reasons?

• building taxonomy of factors from reasons• mapping measure failures onto taxonomy

62

Charmaz, K. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. 2006.

Furniss, D., Blandford, A., Curzon, P. and Mary, Q. (2011). Confessions from a grounded theory PhD: experiences and lessons learnt. Proc. ACM CHI 2011, p 113-122.

Page 63: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

A Taxonomy of Cluster Separation Factors

63

Variance

Influence

curvy

round

Isot

ropy

Curvature

Centroid

Size

small large

Countfew many

Densitysparse dense

Variance of Count similar different

Variance ofDensity similar different

Variance of Size similar different

Outlier

Sca

leP

oint

Dis

tanc

eP

ositi

on

Class/Point Count

few classesmany points

many classesfew points

Mixtureinterwovenequidistantrandom

Splitcontiguous split

Within-Class Factors Between-Class Factors

Clumpiness

clumpyone

dense spotuniformly random

many dense spotsequidistant

non-existent existentInner-Outer Position

distantadjacentpartial

overlap separatefull

overlapClass Separation

Variance of Shape

similar different

Sha

pe

manynone

misleadingevocative

narrowShape

VS. VS.

Page 64: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

High-Level Results

64

Centroid

Grid

0 25 50 75 100

Failure casesOk

51%

49%

Centroid

Grid

0 25 50 75 100

68%

65%

All (816)

Only real (296)

All failure cases Centroid

Grid

0 25 50 75 100

False PositivesFalse Negatives

68%

85%

Page 65: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Centroid Failure Example

65

• big classes overspread small ones

−4 −2 0 2 4

−4−2

02

4

.x

.y

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●● ●

●●

●● ●

Red: 77 (Good)Problem: FP

Data: Gaussian, syntheticDR: MDS

Page 66: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Relevant Taxonomy Factors

66

Variance

Influence

curvy

round

Isot

ropy

Curvature

Centroid

Size

small large

Countfew many

Densitysparse dense

Variance of Count similar different

Variance ofDensity similar different

Variance of Size similar different

Outlier

Sca

leP

oint

Dis

tanc

eP

ositi

on

Class/Point Count

few classesmany points

many classesfew points

Mixtureinterwovenequidistantrandom

Splitcontiguous split

Within-Class Factors Between-Class Factors

Clumpiness

clumpyone

dense spotuniformly random

many dense spotsequidistant

non-existent existentInner-Outer Position

distantadjacentpartial

overlap separatefull

overlapClass Separation

Variance of Shape

similar different

Sha

pe

manynone

misleadingevocative

narrowShape

VS. VS.

Variance

Influence

curvy

round

Isot

ropy

Curvature

Centroid

Size

small large

Countfew many

Densitysparse dense

Variance of Count similar different

Variance ofDensity similar different

Variance of Size similar different

Outlier

Sca

leP

oint

Dis

tanc

eP

ositi

on

Class/Point Count

few classesmany points

many classesfew points

Mixtureinterwovenequidistantrandom

Splitcontiguous split

Within-Class Factors Between-Class Factors

Clumpiness

clumpyone

dense spotuniformly random

many dense spotsequidistant

non-existent existentInner-Outer Position

distantadjacentpartial

overlap separatefull

overlapClass Separation

Variance of Shape

similar different

Sha

pe

manynone

misleadingevocative

narrowShape

VS. VS.

Page 67: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Centroid: Mapping Assumptions Into Taxonomy

67

• centroid only reliable if–round-ish clusters–not more than one dense spot–no outliers–similar sizes & number of points

• rarely true for real datasets

Page 68: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Related Work

• Scagnostics [Wilkinson et al. 2005]–mathematical description and algorithmic instantiation vs

human perception

68

Variance

Influence

curvy

round

Isot

ropy

Curvature

Centroid

Size

small large

Countfew many

Densitysparse dense

Variance of Count similar different

Variance ofDensity similar different

Variance of Size similar different

Outlier

Sca

leP

oint

Dis

tanc

eP

ositi

on

Class/Point Count

few classesmany points

many classesfew points

Mixtureinterwovenequidistantrandom

Splitcontiguous split

Within-Class Factors Between-Class Factors

Clumpiness

clumpyone

dense spotuniformly random

many dense spotsequidistant

non-existent existentInner-Outer Position

distantadjacentpartial

overlap separatefull

overlapClass Separation

Variance of Shape

similar different

Sha

pe

manynone

misleadingevocative

narrowShape

VS. VS.

Page 69: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–qualitative data study

• we encourage more work along these lines

• outcomes–taxonomy to understand current problems

• measures

–taxonomy to advise future development• measures, techniques, systems

• then what?–from how to help them do DR better

to understanding when they need to do it at all69

Page 70: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Outline

• how can we design better DR algorithms/techniques?• how can we build a DR system for real people?• how should we show people DR results?

–next: continue figuring out what people need

• when do people need to use DR?–sometimes they don’t: QuestVis–how to figure out when they do or don’t:

Design Study Methodology

70

Page 71: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Reflections on

QuestVisA Visualization System for an Environmental Sustainability Model

Reflections on QuestVis: A Visualization System for an Environmental Sustainability Model Munzner, Barsky, Williams.

Scientific Visualization: Interactions, Features, Metaphors. Dagstuhl Follow-Ups 2, 2011, Chapter 17, p 240--259.

joint work with:Aaron Barsky, Matt Williams

71

http://www.cs.ubc.ca/labs/imager/tr/2011/QuestVis/

Page 72: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Application Domain: Sustainability

• user data: sustainability simulation model–high-dimensional inputs/outputs

• our decision: show relationship between input choices and output indicators with linked views including DR layout

72

Page 73: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Hammer Looking for A Nail

• wrong task abstraction: they didn’t need DR!–goal mismatch

• discussion of issues and behavior change from general public •not data analysis to understand exact relationships between input and output variables

–this failure case was one of motivations for nested model

• how can we tell what users actually need?–talking to users: necessary but not sufficient–we now have some answers!

• we have proposed a methodology for problem-driven research– design studies: build vis tools to solve user problems– DR as one of many possible techniques that might be used

73

Page 74: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Design Study MethodologyReflections from the Trenches and from the Stacks

Design Study Methodology: Reflections from the Trenches and from the Stacks.Sedlmair, Meyer, Munzner. IEEE TVCG 18(12): 2431-2440, 2012 (Proc. InfoVis 2012).

joint work with:Michael Sedlmair, Miriah Meyer

74

http://www.cs.ubc.ca/labs/imager/tr/2012/dsm/

Page 75: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

75

Design Studies

• long and winding road with many pitfalls–reflections after doing 21 of them

• many successes, a few failures, many lessons learned

Page 76: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

How To Do Design Studies

• definitions

• 9-stage framework

• 32 pitfalls and how to avoid them

INFORMATION LOCATION computerhead

TASK

CLA

RITY

fuzz

ycrisp

NO

T EN

OU

GH

DAT

A

DESIGN STUDY METHODOLOGY SUITABLE

ALGORITHM AUTOMATION

POSSIBLE

PRECONDITIONpersonal validation

COREinward-facing validation

ANALYSISoutward-facing validation

learn implementwinnow cast discover design deploy reflect write

76

Page 78: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Methods and Outcomes

• methods–introspection on lessons learned as authors and reviewers

• outcomes–prescriptive methodology advice

• here’s a way to do design studies• avoid these pitfalls

• exhortation–meta/how-to/reflection papers are worth doing–thinking about methods and methodologies is fruitful for

any flavor of research!

78

Page 79: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Work in Progress

• DR in the Wild–final results coming soon

• DR for journalism–Overview project http://overview.ap.org

–funded by Knight Foundation, collaboration with AP• starting point: Glimmer meets WikiLeaks

– led us to identify and address more unmet real-world analysis needs– new technique developed and partially deployed

• end point: stay tuned

79

Page 80: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Conclusions

• cross-fertilization from attacking DR through different methodological angles–scratching own itches often leads to problems that are

important and high impact• outcomes of evaluation informs how to build• grappling with issues of building informs what studies to run• taxonomy creation informs what to build: unsolved problems

• finding mismatches–between principles and practice–between practice and needs

• need parallax view of principles, practices, and needs!

80

Page 81: Dimensionality Reduction From Several Anglesfodava.gatech.edu/files/uploaded/DLS/munzner-talk.pdfDimensionality Reduction From Several Angles Tamara Munzner Department of Computer

Thanks and Questions• further info

– http://www.cs.ubc.ca/~tmm/talks#fodava12– http://www.cs.ubc.ca/~tmm/papers

• acknowledgements– funding: NSERC Strategic Grant– joint work: all collaborators

• Aaron Barsky, Steven Bergner, Matthew Brehmer, Stephen Ingram, Veronika Irvine, Miriah Meyer, Torsten Möller, Marc Olano, David W. Sprague, Melanie Tory, Michael Sedlmair, Wing Yan So, Andrada Tatu, Matt Williams, Fuqu Wu

– feedback on this talk• Matthew Brehmer, Joel Ferstay, Stephen Ingram, Torsten Möller, Michael Sedlmair,

Jessica Dawson

• hiring opportunity– Stephen Ingram (DimStiller, Glimmer, Glint) will finish PhD soon – http://www.cs.ubc.ca/~sfingram– available for hacker-analyst job in industry or research lab

• in spring 2014 after postdoc 81