visualization techniques in data mining - intranet deibhome.deib.polimi.it/lanzi/taadm/gray/lecture...
TRANSCRIPT
![Page 1: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/1.jpg)
Prof. Pier Luca LanziLaurea in Ingegneria InformaticaPolitecnico di MilanoPolo di Milano Leonardo
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining
Visualization Techniquesin Data Mining
![Page 2: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/2.jpg)
© Pier Luca Lanzi
Outline
• Goals of visualization• Advantages• Methodologies• Techniques• User interaction• Problems
![Page 3: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/3.jpg)
© Pier Luca Lanzi
Goals of Data Visualization• Today there is the need to manage a huge
amount of data, and computer systems help us in this task
• Visual Data Mining help to deal with this flood of information, integrating the human in the data analysis process
• Visual Data Mining allows the user to gain insight into the data, drawing conclusions and directly interacting with the data
![Page 4: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/4.jpg)
© Pier Luca Lanzi
Advantages of visualization techniquesThe main advantages of the application of Visualdata mining techniques are:• Visual data exploration can easily deal with very large,
highly non homogeneous and noisy amount of data
• Visual data exploration requires no understanding of complex mathematical or statistical algorithms
• Visualization techniques provide a qualitative overview useful for further quantitative analysis
![Page 5: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/5.jpg)
© Pier Luca Lanzi
Approach methodologies
Confirmative Analysis:• starting point: hypotheses about the data• result: visualization of the data allowing confirmation or rejection of
the hypotheses
Presentation:• starting point: facts to be presented are fixed a priori• result: high-quality visualization of the data presenting the facts
Explorative Analysis:• starting point: data without hypotheses• result: visualization of the data, which can provide hypotheses
about data distribution
![Page 6: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/6.jpg)
© Pier Luca Lanzi
Visualization techniques• Geometric techniques: scatterplots matrices, Hyperslice,
parallel coordinates
• Pixel-oriented techniques: simple line-by-line, spiral and circle segments
• Hierarchical techniques: Treemap, cone trees• Graph-based techniques: 2D and 3D graph• Distortion techniques: hyperbolic tree, fisheye view,
perspective wall• User interaction: brushing, linking, dynamic projections and
rotations, dynamic queries
![Page 7: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/7.jpg)
© Pier Luca Lanzi
Geometric techniques
Basic idea:• Visualization of geometric transformations and
projections of the data
Methods:• Scatterplot matrices• Hyperslice• Parallel coordinates
![Page 8: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/8.jpg)
© Pier Luca Lanzi
Scatterplot matrices
• A scatterplot matrixis composed ofscatter plots of allpossible pairs ofvariables in a dataset
• Assuming a N-dimension dataset,there are (N2-N)/2pairs of twodimension plots
![Page 9: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/9.jpg)
© Pier Luca Lanzi
Hyperslice
• HyperSlice is anextension of thescatterplot matrix
• They represent a multi-dimensionalfunction as amatrix of orthogonal two-dimensionalslices
![Page 10: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/10.jpg)
© Pier Luca Lanzi
Parallel Coordinates• The axes are defined as parallel
vertical lines separated
• A point in Cartesian coordinatescorrespond to a polyline in parallel coordinates
• Able to visualize data that may beoccluded in Cartesian coordinates
![Page 11: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/11.jpg)
© Pier Luca Lanzi
Pixel-oriented techniquesBasic idea:• The basic idea of pixel-oriented techniques is to map each
data value to a colored pixel• Each attribute value is represented by a pixel with a color
tone proportional to a relevance factor in a separate window
Methods:• Simple Arrangement Line-by-Line• Spiral and Circle Segments Techniques
![Page 12: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/12.jpg)
© Pier Luca Lanzi
Pixel-oriented techniques
• Simple arrangement line-by-line
![Page 13: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/13.jpg)
© Pier Luca Lanzi
Pixel-oriented techniques• Spiral
• Circle segments
![Page 14: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/14.jpg)
© Pier Luca Lanzi
Hierarchical techniquesBasic idea:Visualization of the data using a hierarchicalpartitioning into two- or three-dimensionalsubspaces
Methods:• Treemap• Cone trees
![Page 15: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/15.jpg)
© Pier Luca Lanzi
Treemap• Visualization of hierarchical collections of quantitative data as files
on a hard drive, financial analysis, bioinformatics, etc..
• Divide a limited screen space display area into a sequence ofrectangles whose areas correspond to an attribute of data set
http://www.smartmoney.com/marketmap/
![Page 16: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/16.jpg)
© Pier Luca Lanzi
Cone trees3-dimensional extension of the more familiar2-D hierarchical tree structures, to a moreintuitive navigation and display of information
![Page 17: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/17.jpg)
© Pier Luca Lanzi
Graph-based visualization• Graphs (edges + nodes) with labels and
attributes• Used where emphasis is on data relationship
(databases, telecom)• Coordinates not always meaningful• Useful for discovering patterns
![Page 18: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/18.jpg)
© Pier Luca Lanzi
Graph-based visualization• Color and thickness code values• Asymmetric relations:
![Page 19: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/19.jpg)
© Pier Luca Lanzi
Graph-based visualization• E-mail (SeeNet)
![Page 20: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/20.jpg)
© Pier Luca Lanzi
Graph-based visualization• 3D graphs:
– more room for objects– different points of view
• Example (hypertexts – Narcissus):
![Page 21: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/21.jpg)
© Pier Luca Lanzi
Focus vs. context• Too much data in too small screens• Solutions:
– dual views (detailed + global)– distorted view (e.g. fisheye view)
![Page 22: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/22.jpg)
© Pier Luca Lanzi
Distortion• Hyperbolic tree
• Fisheye view
• Perspective wall
![Page 23: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/23.jpg)
© Pier Luca Lanzi
User interaction• Brushing: selecting points or regions• Linking: more views work together
![Page 24: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/24.jpg)
© Pier Luca Lanzi
User interaction• Dynamic projections and rotations
– Interactively and continuously moving through subspaces
• Dynamic queries– Visual interface (button and sliders)– Incremental behavior (undo)
![Page 25: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/25.jpg)
© Pier Luca Lanzi
Problems• Missing attributes
– Ignore – Fill blanks with:
• a predefined constant• a value extracted according to the inferred
distribution
– Assess the effect of interpolated values
![Page 26: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/26.jpg)
© Pier Luca Lanzi
Problems• Large data sets
– Typical screens have one million pixels– Subsampling– Voxel/pixel bins– Jittering
• Large number of attributes– Principal component analysis– Factor analysis– Etc.
![Page 27: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/27.jpg)
© Pier Luca Lanzi
Conclusions• Human and computer skills can be integrated
with visual data mining• Visualization may be useful for:
– understanding what is happening– searching novel patterns
• User interaction is paramount in these
![Page 28: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/28.jpg)
© Pier Luca Lanzi
References (I)• D. A. Keim. “Visual Techniques for Exploring Databases”. Int.
Conference on Knowledge Discovery in Databases, 1997.• D. A. Keim. “Information visualization and visual data mining”. IEEE
Trans. on Visualization and Computer Graphics, jan 2002, vol. 8, no. 1, pp. 1-8
• J. Van Wijk, R. Van Liere. “HyperSlice - Visualization of scalar functions of many variables”. IEEE Visualization, 1993, pp.119-125.
• P. C. Wong, A. H. Crabb, R. D. Bergeron. “Dual multiresolution HyperSlice for multivariate data visualization”. InfoVis 1996
• D. A. Keim. “Pixel-oriented Database Visualizations”. SIGMODRECORD, Special Issue on Information Visualization, 1996.
• M. Ankerst, D. A. Keim, H.-P. Kriegel. “Circle Segments: A Technique for Visually Exploring Large Multidimensional Data Sets”. Visualization '96, 1996.
• B. B. Bederson, B. Shneiderman, M. Wattenberg. “Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies”. ACM Transactions on Graphics, 2002, pp. 833-854.
![Page 29: Visualization Techniques in Data Mining - Intranet DEIBhome.deib.polimi.it/lanzi/taadm/gray/Lecture 01 Visualization.pdf · ©Peir Luca Lanzi Goals of Data Visualization • Today](https://reader034.vdocuments.net/reader034/viewer/2022051720/5a768e907f8b9a1b688d57a5/html5/thumbnails/29.jpg)
© Pier Luca Lanzi
References (II)• R. A. Becker, S. G. Eick, A. R. Wilks. “Visualizing Network Data”.
IEEE Trans. on Visualization and Computer Graphics, mar 1995, vol. 1, no. 1, pp. 16-28
• R. J. Hendley, N. S. Drew, A. M. Wood, R. Beale. “Narcissus: visualising information”. InfoVis 1995, p. 90
• T. A. Keahey, E. L. Robertson (1996). “Techniques for non-linear magnification transformations”. InfoVis 1996
• J. Lamping, R. Rao, P. Pirolli. “A focus+context technique based on hyperbolic geometry for visualizing large hierarchies”. CHI '95, pp. 401-408
• J. D. Mackinlay, G. G. Robertson, S. K. Card. “The perspective wall: detail and context smoothly integrated”. CHI '91, pp. 173-176