exploring temporal graph data with python: a study on tensor decomposition of wearable sensor data...

43
EXPLORING TEMPORAL GRAPH DATA WITH PYTHON A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA ANDRÉ PANISSON @apanisson ISI Foundation, Torino, Italy & New York City

Upload: andre-panisson

Post on 09-Feb-2017

2.029 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

EXPLORING TEMPORAL GRAPH DATA WITH PYTHONA STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA

ANDRÉ PANISSON

@apanisson ISI Foundation, Torino, Italy & New York City

Page 2: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

WHY TENSOR FACTORIZATION + PYTHON?▸ Matrix Factorization is already used in many fields

▸ Tensor Factorization is becoming very popularfor multiway data analysis

▸ TF is very useful to explore temporal graph data

▸ But still, the most used tool is Matlab

▸ There’s room for improvement in the Python libraries for TF

▸ Study: NTF of wearable sensor data

Page 3: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSORS AND TENSOR DECOMPOSITION

Page 4: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

FACTOR ANALYSIS

Spearman ~1900

X≈WH

Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects

Spearman, 1927: The abilities of man.

test

s

subjects subjects

test

s

Int.

Int.

X WH

Page 5: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TOPIC MODELING / LATENT SEMANTIC ANALYSIS

Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.

. , ,

. , ,

. . .

genednageneti c

lifeevolveorganism

brai nneuronnerve

datanumbercomputer. , ,

Topics DocumentsTopic proportions and

assignments

0.040.020.01

0.040.020.01

0.020.010.01

0.020.020.01

datanumbercomputer. , ,

0.020.020.0 1

Page 6: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TOPIC MODELING / LATENT SEMANTIC ANALYSIS

X≈WHNon-negative Matrix Factorization (NMF):

(~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung)

2005 Gaussier et al. "Relation between PLSA and NMF and implications."

argminW,H

kX�WHk s. t. W,H � 0

≈do

cum

ents

terms terms

docu

men

ts

topic

topi

c

SparseMatrix!

Page 7: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

NON-NEGATIVE MATRIX FACTORIZATION (NMF)

NMF gives Part based representation(Lee & Seung – Nature 1999)

NMF

Original

PCA

×

=

NMF is equivalent to Spectral Clustering(Ding et al. - SDM 2005)

W W • VHT

WHHT

H H • WTV

WTWH

argminW,H

kX�WHk s. t. W,H � 0

Page 8: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

from sklearn import datasets, decomposition

digits = datasets.load_digits()A = digits.data

nmf = decomposition.NMF(n_components=10)W = nmf.fit_transform(A)H = nmf.components_

plt.rc("image", cmap="binary")plt.figure(figsize=(8,4))for i in range(10): plt.subplot(2,5,i+1) plt.imshow(H[i].reshape(8,8)) plt.xticks(()) plt.yticks(())plt.tight_layout()

Page 9: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

BEYOND MATRICES: HIGH DIMENSIONAL DATASETS

Cichocki et al. Nonnegative Matrix and Tensor Factorizations

Environmental analysis ▸ Measurement as a function of (Location, Time, Variable) Sensory analysis ▸ Score as a function of (Food sample, Judge, Attribute) Process analysis ▸ Measurement as a function of (Batch, Variable, time) Spectroscopy ▸ Intensity as a function of (Wavelength, Retention, Sample, Time,

Location, …)

MULTIWAY DATA ANALYSIS

Page 10: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

DIGITAL TRACES FROM SENSORS AND IOTUSER POSITION TIME …

Page 11: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Sidiropoulos,

Giannakis and Bro,

IEEE Trans. Signal Processing, 2000.

Mørup, Hansen and Arnfred,

Journal of Neuroscience Methods, 2007.

Hazan, Polak and

Shashua, ICCV 2005.

Bader, Berry, Browne,

Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed.,

2007.

Doostan and Iaccarino, Journal of Computational Physics, 2009.

Andersen and Bro, Journalof Chemometrics, 2003.

• Chemometrics– Fluorescence Spectroscopy– Chromatographic Data

Analysis• Neuroscience

– Epileptic Seizure Localization– Analysis of EEG and ERP

• Signal Processing• Computer Vision

– Image compression, classification

– Texture analysis• Social Network Analysis

– Web link analysis– Conversation detection in

emails– Text analysis

• Approximation of PDEs

data reconstruction, cluster analysis, compression, dimensionality reduction, latent semantic analysis, …

Page 12: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSORS

Page 13: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

WHAT IS A TENSOR?

A tensor is a multidimensional arrayE.g., three-way tensor:

Mode-1

Mode-2

Mode-3

651a

Page 14: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

FIBERS AND SLICES

Cichocki et al. Nonnegative Matrix and Tensor Factorizations

Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers

Horizontal Slices Lateral Slices Frontal Slices

A[:, 4, 1] A[:, 1, 4] A[1, 3, :]

A[1, :, :] A[:, :, 1]A[:, 1, :]

Page 15: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION

Matricization: convert a tensor to a matrix

Vectorization: convert a tensor to a vector

Page 16: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

>>> T = np.arange(0, 24).reshape((3, 4, 2))>>> Tarray([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7]],

[[ 8, 9], [10, 11], [12, 13], [14, 15]],

[[16, 17], [18, 19], [20, 21], [22, 23]]])

OK for dense tensors: use a combination of transpose() and reshape()

Not simple for sparse datasets (e.g.: <authors, terms, time>)

for j in range(2): for i in range(4): print T[:, i, j]

[ 0 8 16][ 2 10 18][ 4 12 20][ 6 14 22][ 1 9 17][ 3 11 19][ 5 13 21][ 7 15 23]

# supposing the existence of unfold

>>> T.unfold(0)array([[ 0, 2, 4, 6, 1, 3, 5, 7], [ 8, 10, 12, 14, 9, 11, 13, 15], [16, 18, 20, 22, 17, 19, 21, 23]])>>> T.unfold(1)array([[ 0, 8, 16, 1, 9, 17], [ 2, 10, 18, 3, 11, 19], [ 4, 12, 20, 5, 13, 21], [ 6, 14, 22, 7, 15, 23]])>>> T.unfold(2)array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22], [ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])

Page 17: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

RANK-1 TENSORThe outer product of N vectors results in a rank-1 tensor

array([[[ 1., 2.], [ 2., 4.], [ 3., 6.], [ 4., 8.]],

[[ 2., 4.], [ 4., 8.], [ 6., 12.], [ 8., 16.]],

[[ 3., 6.], [ 6., 12.], [ 9., 18.], [ 12., 24.]]])

a = np.array([1, 2, 3])b = np.array([1, 2, 3, 4])c = np.array([1, 2])

T = np.zeros((a.shape[0], b.shape[0], c.shape[0]))

for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(c.shape[0]): T[i, j, k] = a[i] * b[j] * c[k]

T = a(1) � · · · � a(N)=

a

c

b

Ti,j,k = aibjck

Page 18: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSOR RANK

▸ Every tensor can be written as a sum of rank-1 tensors

=

a1 aJ

c1 cJ

b1 bJ

+ +

▸ Tensor rank: smallest number of rank-1 tensors that can generate it by summing up

X ⇡RX

r=1

a(1)r � a(2)r � · · · � a(N)r ⌘ JA(1),A(2), · · · ,A(N)K

T ⇡RX

r=1

ar � br � cr ⌘ JA,B,CK

Page 19: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

array([[[ 61., 82.], [ 74., 100.], [ 87., 118.], [ 100., 136.]],

[[ 77., 104.], [ 94., 128.], [ 111., 152.], [ 128., 176.]],

[[ 93., 126.], [ 114., 156.], [ 135., 186.], [ 156., 216.]]])

A = np.array([[1, 2, 3], [4, 5, 6]]).TB = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).TC = np.array([[1, 2], [3, 4]]).T

T = np.zeros((A.shape[0], B.shape[0], C.shape[0]))for i in range(A.shape[0]): for j in range(B.shape[0]): for k in range(C.shape[0]): for r in range(A.shape[1]): T[i, j, k] += A[i, r] * B[j, r] * C[k, r]

T = np.einsum('ir,jr,kr->ijk', A, B, C)

: Kruskal TensorT ⇡RX

r=1

ar � br � cr ⌘ JA,B,CK

Page 20: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSOR FACTORIZATION▸ CANDECOMP/PARAFAC factorization (CP) ▸ extensions of SVD / PCA / NMF of matrices

NON-NEGATIVE TENSOR FACTORIZATION▸ Decompose a non-negative tensor to

a sum of R non-negative rank-1 tensors

argmin

A,B,CkT� JA,B,CKk

with JA,B,CK ⌘RX

r=1

ar � br � cr

subject to A � 0,B � 0,C � 0

Page 21: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSOR FACTORIZATION: HOW TO

Alternating Least Squares(ALS):Fix all but one factor matrix to which LS is applied

minA�0

kT(1) �A(C�B)T k

minB�0

kT(2) �B(C�A)T k

minC�0

kT(3) �C(B�A)T k

� denotes the Khatri-Rao product, which is a

column-wise Kronecker product, i.e., C�B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br]

T(1) = A(C� B)T

T(2) = B(C� A)T

T(3) = C(B� A)T

Unfolded Tensoron the kth mode

Page 22: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

F = [zeros(n, r), zeros(m, r), zeros(t, r)]FF_init = np.rand((len(F), r, r))

def iter_solver(T, F, FF_init):

# Update each factor for k in range(len(F)): # Compute the inner-product matrix FF = ones((r, r)) for i in range(k) + range(k+1, len(F)): FF = FF * FF_init[i]

# unfolded tensor times Khatri-Rao product XF = T.uttkrp(F, k)

F[k] = F[k]*XF/(F[k].dot(FF)) # F[k] = nnls(FF, XF.T).T

FF_init[k] = (F[k].T.dot(F[k])) return F, FF_init

W W • VHT

WHHT

H H • WTV

WTWH

minA�0

kT(1) �A(C�B)T k

minB�0

kT(2) �B(C�A)T k

minC�0

kT(3) �C(B�A)T k

argminW,H

kX�WHk s. t. W,H � 0

J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High-Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.

Page 23: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

HOW TO INTERPRET: USER X TERM X TIME

X is a 3-way tensor in which xnmt is 1 if the term m was used by user n at interval t, 0 otherwise ANxK is the the association of each user n to a factor k BMxK is the association of each term m to a factor k CTxK shows the time activity of each factor

user

s

user

s

C

=X A

B

(N×M×T)

(T×K)

(N×K)

(M×K)terms

time

time

terms

factors

Page 24: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

http://www.datainterfaces.org/2013/06/twitter-topic-explorer/

Page 25: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TOOLS FOR TENSOR DECOMPOSITION

Page 26: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TOOLS FOR TENSOR FACTORIZATION

Page 27: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TOOLS: THE PYTHON WORLD

NumPy SciPy

Scikit-Tensor (under development): github.com/mnick/scikit-tensor

NTF: gist.github.com/panisson/7719245

Page 28: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA

Page 29: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)
Page 30: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

recorded proximity data

direct proximitysensing

Page 31: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

primary school

Lyon, France primary school 231 students 10 teachers

Page 32: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Hong Kong primary school 900 students 65 teachers

Page 33: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

SocioPatterns.org

7 years, 30+ deployments, 10 countries, 50,000+ persons • Mongan Institute for Health Policy, Boston• US Army Medical Component of the Armed Forces, Bangkok• School of Public Health of the University of Hong Kong• KEMRI Wellcome Trust, Kenya• London School for Hygiene and Tropical Medicine, London• Public Health England, London• Saw Swee Hock School of Public Health, Singapore

Page 34: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

TENSORS

Page 35: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

0 1 0

1 0 1

0 1 0

FROM TEMPORAL GRAPHS TO 3-WAY TENSORS

Page 36: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

temporal network

tensorialrepresentation

tensor factorization

factors

communities temporal activity

factorizationquality

A,B C

tuning the complexityof the model

node

s

communities

1B5A

3B5B

2B2A

3A4A

1A4B

50

60

70

80

0

10

20

30

4040

�����

Figure 2: Temporal activity of each community

3

50

60

70

80

0

10

20

30

4040

�����

Figure 2: Temporal activity of each community

3

50

60

70

80

0

10

20

30

4040

�����

Figure 2: Temporal activity of each community

3

structures in temporal networks

components

node

s

time

time interval

quality metrics

component

Page 37: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

L. Gauvin et al., PLoS ONE 9(1), e86028 (2014)

1B5A

3B5B

2B2A

3A4A

1A

4B

TENSOR DECOMPOSITION OF SCHOOL NETWORK

Page 38: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

https://github.com/panisson/ntf-school

Page 39: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

ANOMALY DETECTION IN TEMPORAL NETWORKS

Page 40: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

ANOMALY DETECTION IN TEMPORAL NETWORKS

A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks

Page 41: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

anomaly detection in temporal networks

Page 42: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Laetitia Gauvin Ciro Cattuto Anna Sapienza

.fit().predict()

( )

Page 43: Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

@[email protected]

thank you