text analytics (text mining) -...

66
Text Analytics (Text Mining) LSI (uses SVD), Visualization CSE 6242 / CX 4242 Apr 3, 2014 Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song

Upload: others

Post on 01-Jun-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Text Analytics (Text Mining) LSI (uses SVD), Visualization

CSE 6242 / CX 4242 Apr 3, 2014

Duen Horng (Polo) ChauGeorgia Tech

Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song

Page 2: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Singular Value Decomposition (SVD): Motivation

Problem #1: "" Text - LSI uses SVD find “concepts”""

Problem #2: "" Compression / dimensionality reduction

Page 3: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - MotivationProblem #1: text - LSI: find “concepts”

Page 4: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - MotivationCustomer-product, for recommendation system:

bread

lettu

cebe

ef

vegetarians

meat eaters

tomato

sch

icken

Page 5: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation• problem #2: compress / reduce

dimensionality

Page 6: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Problem - Specification~10^6 rows; ~10^3 columns; no updates;"Random access to any cell(s); small error: OK

Page 7: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation

Page 8: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation

Page 9: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

Page 10: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 11: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 12: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 13: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x =

Page 14: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T"

"

A: n x m matrix e.g., n documents, m terms"U: n x r matrix e.g., n documents, r concepts"Λ: r x r diagonal matrix r : rank of the matrix; strength of each ‘concept’"V: m x r matrix " e.g., m terms, r concepts

Page 15: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

= x xn

m r

rrn

mr

n documentsm terms

n documentsr concepts"

diagonal entries: concept strengths

m termsr concepts

Page 16: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - PropertiesTHEOREM [Press+92]:

always possible to decompose matrix A into A = U Λ VT"

U, Λ, V: unique, most of the time"U, V: column orthonormal "i.e., columns are unit vectors, orthogonal to each other"

UT U = I"VT V = I

Λ: diagonal matrix with non-negative diagonal entires, sorted in decreasing order

(I: identity matrix)

Page 17: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - ExampleA = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

Page 18: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

CS-conceptMD-concept

Page 19: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

CS-conceptMD-concept

doc-to-concept similarity matrix

Page 20: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

‘strength’ of CS-concept

Page 21: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

term-to-concept similarity matrix

CS-concept

Page 22: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

term-to-concept similarity matrix

CS-concept

Page 23: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!• U: document-to-concept similarity matrix!• V: term-to-concept sim. matrix!•Λ: its diagonal elements: ‘strength’ of each

concept

Page 24: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what

is AT A?!A:!Q: A AT ?!A:

Page 25: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what

is AT A?!A: term-to-term ([m x m]) similarity matrix!Q: A AT ?!A: document-to-document ([n x n]) similarity

matrix

Page 26: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

• V are the eigenvectors of the covariance matrix ATA !

!

• U are the eigenvectors of the Gram (inner-product) matrix AAT

SVD properties

Thus, SVD is closely related to PCA, and can be numerically more stable. For more info, see:http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.

Page 27: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2best axis to project on !

(‘best’ = min sum of squares of projection errors)

min RMS error

v1

First Singular Vector

Page 28: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

• A = U Λ VT - example:

= x x

variance (‘spread’) on the v1 axis

v1

Page 29: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

• A = U Λ VT - example:!–U Λ gives the coordinates of the points in the

projection axis

= x x

Page 30: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?

= x x

Page 31: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?!• A: set the smallest singular values to zero:

= x x

Page 32: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 33: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 34: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 35: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~

Page 36: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

= x x

Page 37: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

= x x

Page 38: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =!• ‘communities’ (bi-partite cores, here)

Row 1

Row 4

Col 1

Col 3

Col 4Row 5

Row 7

Page 39: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD algorithm

• Numerical Recipes in C (free)

Page 40: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• Drill: find the SVD, ‘by inspection’!!• Q: rank = ??

= x x?? ??

??

Page 41: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/

cols)

= x x??

????

??

Page 42: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/

cols)

= x x

orthogonal??

Page 43: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• column vectors: are orthogonal - but not

unit vectors:

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 44: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• and the singular values are:

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 45: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• Q: How to check we are correct?

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 46: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: SVD properties:!

–matrix product should give back matrix A –matrix U should be column-orthonormal, i.e.,

columns should be unit vectors, orthogonal to each other!

–ditto for matrix V –matrix Λ should be diagonal, with non-negative

values

Page 47: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - ComplexityO(n*m*m) or O(n*n*m) (whichever is less)!!

Faster version, if just want singular values! or if we want first k singular vectors! or if the matrix is sparse [Berry]!!

No need to write your own!Available in most linear algebra packages (LINPACK, matlab, Splus/R, mathematica ...)

Page 48: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

References• Berry, Michael: http://www.cs.utk.edu/~lsi/!• Fukunaga, K. (1990). Introduction to Statistical

Pattern Recognition, Academic Press.!• Press, W. H., S. A. Teukolsky, et al. (1992).

Numerical Recipes in C, Cambridge University Press.

Page 49: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on

spanish text?)

Page 50: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Problem: Eg., find documents with ‘data’

datainf.

retrievalbrain lung

=CS

MD

x x

Page 51: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

=CS

MD

x x

Page 52: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

Page 53: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

A: inner product (cosine similarity) with each ‘concept’ vector vi

Page 54: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

A: inner product (cosine similarity) with each ‘concept’ vector vi

q o v1

Page 55: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIcompactly, we have:! q V= qconcept!

Eg:data

inf.retrieval

brain lung

q=

term-to-concept similarities

=

CS-concept

Page 56: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIDrill: how would the document (‘information’,

‘retrieval’) be handled by LSI?

Page 57: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIDrill: how would the document (‘information’,

‘retrieval’) be handled by LSI? A: SAME:!dconcept = d V Eg: data

inf.retrieval

brain lung

d=

term-to-concept similarities

=

CS-concept

Page 58: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIObservation: document (‘information’,

‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!!

datainf.

retrievalbrain lung

d=

CS-concept

q=

Page 59: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on

spanish text?)

Page 60: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSI• Problem:!

–given many documents, translated to both languages (eg., English and Spanish)!

–answer queries across languages

Page 61: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSI• Solution: ~ LSI

datainf.

retrievalbrain lung

CS

MD

datosinformacion

Page 62: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Switch Gear to Text Visualization

What comes up to your mind?!

What visualization have you seen before?

62

Page 63: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word/Tag Cloud (still popular?)

http://www.wordle.net!63

Page 64: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word Counts (words as bubbles)

http://www.infocaptor.com/bubble-my-page64

Page 65: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word Tree

http://www.jasondavies.com/wordtree/65

Page 66: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Phrase Net

http://www-958.ibm.com/software/data/cognos/manyeyes/page/Phrase_Net.html 66

Visualize pairs of words that satisfy a particular pattern, e.g., X and Y