Download - ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie [email protected]
![Page 2: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/2.jpg)
ENV 2006 6.2
Document Visualization - Challenges
• Large collections of electronic text– the Web is prime example!
– E-mail archives
– Literature collections
• Can we use visualization to help us understand..:– content of groups of documents?
– relationships between documents?
• Powerful search and retrieval engines– return documents based on some sort of keyword search
• Can we visualize the results of a query?
![Page 3: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/3.jpg)
ENV 2006 6.3
Views of Documents – 1D View
• Documents can be viewed in different dimensions: 1D, 2D, 3D, multidimensional
• Linear text– Sees document as 1D string of
words– Split into tiles of ‘similar’ text
• Visualization idea– Tilebars– Each document a bar, length
proportional to document length– Shown as set of tiles, with
shading indicating strength of relevance of tile to keywords
Hearst, CHI, 1995
![Page 4: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/4.jpg)
ENV 2006 6.4
2D Document View
• This is how we normally think of documents
– Structure on page is 2D– Zooming interfaces have been
developed– Early one was PAD++:
documents visible at different scales
– (return to zooming interfaces later)
http://www.cs.umd.edu/hcil/pad++/
![Page 5: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/5.jpg)
ENV 2006 6.5
3D Document Views
• Innovative 3D views have been suggested
WebBook: Card et al, CHI, 1996
![Page 6: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/6.jpg)
ENV 2006 6.6
Approach
• Generally approach is in three steps:
– Analyse to capture essential features of document (for Tilebars, relative frequency of words in a segment of text)
– Use algorithms to generate a viable representation of the documents (1D representation in Tilebars)
– Create an interactive visual representation (clicking on a tile gives a list of the corresponding text with keywords highlighted)
Analysis
Algorithms
Visualization
![Page 7: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/7.jpg)
ENV 2006 6.7
Multidimensional Text
• Recent research sees text as multi-dimensional
• Document collection scanned for ‘distinguishing’ words
– Words distinctive to each document (keywords)
– Gives a mathematical ‘signature’ for each document as a high-dimensional vector
– Similarities between documents can then be calculated, so as to create clusters
– Clusters are mapped down to a 2D space, with similar clusters close together and dissimilar ones far apart Galaxy – developed at PNNL, part of
IN-SPIRE product
![Page 8: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/8.jpg)
ENV 2006 6.8
How do we transform from multidimensional to 2D space?
• Self-organising feature maps (Kohonen maps)– Form of neural network
• Input are the vectors for each document• Output is a 2D grid whose nodes represent clusters of similar
documents, with related clusters placed close together
How does it work?
Multilingual informationretrieval documentsfrom database
![Page 9: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/9.jpg)
ENV 2006 6.9
Self-organising maps – A worked example
• Set of 311 documents in a database
• 40 key words extracted from titles
• Matrix of documents vs keywords created
• Set up rectangular grid (10 x 14 was used)
• Each node gets assigned a reference vector with small random values
kw1 kw2 kw3
doc1 1 0 1
doc2 1 1 0
doc3 0 0 1
doc4 1 1 0
![Page 10: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/10.jpg)
ENV 2006 6.10
Self-organising maps – Worked example
• Select a document at random
• Find the ‘nearest’ reference vector in N-dimensional space (ie 40-D here)
• Adjust the reference vector to be closer to the document…
• …and adjust all its neighbours on the grid also
• Iterate (here for 2500 iterations)
• Finally map each document to nearest node
doc2 1 1 0
Ref(5,7) 0.6 0.6 0.1
Ref(5,7) 0.9 0.9 0.03
5,7
![Page 11: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/11.jpg)
ENV 2006 6.11
Self-organising map – Worked example
Concept areas are clustered: languages; technologies; tools
![Page 12: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/12.jpg)
ENV 2006 6.12
Multidimensional Text
• The Galaxy View is extended by ThemeView
• High peaks indicate large number of documents with strong content similarity
• Peaks close together suggest themes which are related
http://in-spire.pnl.gov/
![Page 13: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/13.jpg)
ENV 2006 6.13
Cartographic approach
• Cartographic principles are very relevant to document visualization
• Landscapes are very easy for us to recognise (cf faces)• Level of detail well understood by cartographers (cf Google
maps)
3 differentzoom levels
Skupin, IEEE CG&A, 2002
2200 abstracts
Clusters formed
![Page 14: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/14.jpg)
ENV 2006 6.14
Case Study: Visualizing results from a search query
• Case study from NIST in US
• Suppose search returns a keyword strength– ie user enters a number of keywords
– engine returns list of documents
– each document has a score for each keyword specified (eg number of occurrences)
– most relevant document has largest total score
• How can we visualize this information?
![Page 15: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/15.jpg)
ENV 2006 6.15
Document Spiral
Arrange docsin spiral, mostrelevant at centre
![Page 16: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/16.jpg)
ENV 2006 6.16
Document Three-Keyword Axes Display
One keywordper axis
Plot docs ina scatter plotusing keywordstrengths toposition alongaxes
Same keywordon all axes linesdocs up on X=Y=Z line
![Page 17: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/17.jpg)
ENV 2006 6.17
Nearest Neighbour Sequence
Choose one docand place on circle
Find the closest in‘keyword strength’space and placeadjacent to it.... and so on
http://zing.ncsl.nist.gov/~cugini/uicd/viz.html
![Page 18: ENV 20066.1 Envisioning Information Lecture 6 – Document Visualization Ken Brodlie kwb@comp.leeds.ac.uk](https://reader036.vdocuments.net/reader036/viewer/2022081512/5515f52c550346d46f8b55cd/html5/thumbnails/18.jpg)
ENV 2006 6.18
Visualizing Web Searches
www.kartoo.co.uk