introduction to tensor - intelligent computing for ...zhang/resources/introduction_to_tensor.pdf ·...

Introduction to TensorIntelligent Computing for Computational Intelligence

in post Moores Law era!

Xiao-Yang Liuwww.tensorlet.com

April 19, 2019

1/51

www.tensorlet.com

Introduction to Tensor 4/19/2019

Agenda

I Background

I Tensor Decompositions (CP, Tucker, and Tensor-Train/Tensor-Ring)

I Transform-based Tensor Model and Applications

I Tensor Computations (cuTensor, TensorDeC++)

www.tensorlet.com 2/51

www.tensorlet.com


BackgroundIn sensing systems, the number and resolution of the sensors grow to the point that

multidimensional data of exceedingly huge volume, variety and structural richness becomeubiquitous across disciplines in engineering and data science.


www.tensorlet.com


BackgroundMany problems in computational neuroscience,

neuroinformatics, pattern/image recognition, signalprocessing and machine learning generate massiveamounts of multidimensional data with multipleaspects and high dimensionality. These data havefour “V” characters, as shown in the right figure.

Tensor provides a natural and compactrepresentation for such massive multidimensionaldata via suitable low-rank approximations, of whichthe dynamic analysis allows us to discovermeaningful hidden structures of complex data andto perform generalizations by capturing multi-linearand multi-aspect relationships.

Volume - scale of dataVariety - different forms of dataVeracity - uncertainty of data

Velocity - analysis of streaming datawww.tensorlet.com 4/51

www.tensorlet.com


What Is Tensor?


www.tensorlet.com


Tensor Fibers


www.tensorlet.com


Tensor Slices


www.tensorlet.com


Tensor Unfolding

A(i) means mode-i unfolding.


www.tensorlet.com


Tensor NotationsI Scalars are denoted by lowercase letters, e.g., a.

I Vectors (tensors of order one) are denoted by boldface lowercase letters, e.g., a.

I Matrices (tensors of order two) are denoted by boldface capital letters, e.g., A.

I Higher-order tensors (order three or higher) are denoted by boldface Euler script letters,e.g., X .

I The n-th element in a sequence is denoted by a superscript in parentheses, e.g., A(n)

denotes the n-th matrix in a sequence.

I “◦” represents the vector outer product.

I n-mode product of a tensor X ∈ RI1×I2×···×Id with a matrix U ∈ RJ×In is denoted byX ×n U and is of size I1 × · · · × In−1 × J × In+1 × · · · × Id ,

(X ×n U)i1···in−1jin+1···id =In∑

in=1

xi1i2···idujin .


www.tensorlet.com


Tensor NotationsI The Kronecker product of matrices A ∈ RI×J and A ∈ RK×L is denoted by A⊗ B. The

result is a matrix of size (IK )× (JL) and defined by

A⊗ B =

a11B a12B · · · a1JBa21B a22B · · · a2JB

......

. . ....

aI1B aI2B · · · aIJB

=[a1 ⊗ b1 a1 ⊗ b2 a1 ⊗ b3 · · · aJ ⊗ bL−1 aJ ⊗ bL

].

I The Khatri-Rao product is the “matching columnwise” Kronecker product. Given matricesA ∈ RI×K and B ∈ RJ×K , their KhatriRao product is denoted by A� B. The result is amatrix of size (IJ)× K defined by

A� B =[a1 ⊗ b1 a2 ⊗ b2 · · · aK ⊗ bK

].


www.tensorlet.com


Agenda

I Background





www.tensorlet.com


Rank-One TensorAn N-way tensor X ∈ RI1×I2×···×Id is rank one if it can be written as the outer product of d

vectors:X = a(1) ◦ a(2) ◦ · · · ◦ a(d).

example. Rank-one third-order tensor, X = a ◦ b ◦ c . The (i , j , k) element of X is given byxijk = aibjck .


www.tensorlet.com


CP Decomposition

X ≈R∑

r=1

λrar ◦ br ◦ cr

= Λ×1 A×2 B×3 C

= [[Λ; A,B,C]]

X(1) = AΛ(C� B)T + E(1)

X(2) = BΛ(C� A)T + E(2)

X(3) = CΛ(B� A)T + E(3)


www.tensorlet.com


CP Decomposition

Name Proposed by

Polyadic form of a tensor Hitchcock, 1927PARAFAC (parallel factors) Harshman, 1970CANDECOMP or CAND (canonical decomposition) Carroll and Chang, 1970Topographic components model Mocks, 1988CP (CANDECOMP/PARAFAC) Kiers, 2000

Table: Some of the many names for the CP decomposition.


www.tensorlet.com


Tucker Decomposition

Y = G ×1 A×2 B×3 C + E = [[G; A,B,C]] + E

X(1) ≈ AG(1)(C⊗ B)T

X(2) ≈ BG(2)(C⊗ A)T

X(3) ≈ CG(3)(B⊗ A)T


www.tensorlet.com


Tucker Decomposition

Name Proposed by

Three-mode factor analysis (3MFA/Tucker3) Tucker, 1966Three-mode PCA (3MPCA) Kroonenberg and De Leeuw, 1980N-mode PCA Kapteyn et al., 1986Higher-order SVD (HOSVD) De Lathauwer et al., 2000N-mode SVD Vasilescu and Terzopoulos, 2002

Table: Names for the Tucker decomposition (some specific to three-way and some for N-way).


www.tensorlet.com


Tensor Train Decomposition

TT Form

Ai1i2···id = G1(i1)︸︷︷︸1×R

G2(i2)︸︷︷︸R×R

· · · Gd(id)︸︷︷︸R×1

A graphical representation of the tensor train decomposition


www.tensorlet.com


Tensor Ring DecompositionExample.


www.tensorlet.com


Tensor Ring Decomposition

TR Form

Ai1i2···id = Tr{Z1(i1)Z2(i2) · · · Zd(id)} = Tr

{d∏

k=1

Zk(ik)

}

A graphical representation of the tensor ring decompositionwww.tensorlet.com 20/51

www.tensorlet.com


Agenda

I Background





www.tensorlet.com


Transform-based Model

Basic Operators

The operator matview(·) takes a tensor A ∈ Cn1×n2×n3×n4 and returns an n1n3n4 × n2n3n4block diagonal matrix, with each block being an n1 × n2 matrix, defined as

matview(A) = diag(A1, · · · ,Ap, · · · ,AP), p ∈ [P],

andAp(i , j) = A(i , j , k , l), p = (l − 1)n3 + k , i ∈ [n1], j ∈ [n2], k ∈ [n3], l ∈ [n4],

where P = n3n4, and [n] denotes the index set {1, 2, · · · , n}. The operator tenview(·) foldsmatview(A) back to tensor A, i.e.,

tenview(matview(A)) = A.


www.tensorlet.com



Basic Operators

Given two fourth-order tensors A ∈ Cn1×n′×n3×n4 and B ∈ Cn′×n2×n3×n4 , the correspondingp-th matrices are Ap ∈ Cn1×n′ and Bp ∈ Cn′×n2 , and their multiplication is well-defined asCp = ApBp ∈ Cn1×n2 . Later in the transform domain, we will need the following matrixmultiplication of two block diagonal matrices, e.g.,

matview(C) = matview(A) · matview(B),

where · denotes the conventional matrix multiplication.


www.tensorlet.com


Transform-based ModelTensor-scalar multiplication

Given an invertible 2D discrete transform L : C1×1×n3×n4 → C1×1×n3×n4 , the element-wisemultiplication ◦, and α, β ∈ C1×1×n3×n4 , we define the tensor-scalar multiplication

α • β , L−1(L(α) ◦ L(β)),

where L−1 : C1×1×n3×n4 → C1×1×n3×n4 is the inverse transform.

Tensor-linear combinations

Given tensor scalars cj ∈ C1×1×n3×n4 , j ∈ [n2], a tensor-linear combination of thetensor-columns Aj ∈ Cn1×1×n3×n4 , j ∈ [n2], is defined as

A1 • c1 + · · ·+An2 • cn2 = A • c,

where A = [A1, · · · ,An2 ] and c = [c1, · · · , cn2 ]T .


www.tensorlet.com



L-product

The L-product C = A•B ∈ Cn1×n2×n3×n4 of A ∈ Cn1×n′×n3×n4 and Cn′×n2×n3×n4 is defined as

C(i , j) =∑k∈[n′]

A(i , k) • B(k , j), i ∈ [n1], j ∈ [n2].

Lemma. The L-product C = A • B can be calculated in the following way: First, we compute

matview(C) = matview(A) · matview(B).

Then, we stack matview(C) back to tensor tenview(matview(C)) and perform the inverse

transform to get C, i.e., C = L−1(C). The notation A denotes the transform-domain

representation of A ∈ Cn1×n2×n3×n4 such that A = L(A) and A = L−1(A).


www.tensorlet.com


Transform-based ModelSo the L-product can be considered as

L(C(i , j)) =∑k∈[n′]

L(A(i , k)) ◦ L(B(k, j)),

which can be represented as Cp = ApBp, p ∈ [P].


www.tensorlet.com


Low-tubal-rank ModelNotations

~Ai ≡ A(:, i , :), A(j) ≡ A(:, :, j), A := fft(A, [], 3)

bcirc(A) =

A(1) A(n) A(n−1) · · · A(2)

A(2) A(1) A(n) · · · A(3)

......

.... . .

...A(n) A(n−1) · · · A(2) A(1)

unfold(A) =

A(1)

A(2)

...A(n)

, fold(unfold(A)) = A


www.tensorlet.com


Low-tubal-rank Model

t-Product

A ∗ B = fold(bcirc(A) · unfold(B))

Example. Let A ∈ R3×2×2 with frontal faces

A(1) =

1 00 2−1 3

and A(2) =

−2 1−2 70 −1

,and let ~B ∈ R2×1×2 with frontal faces

B(1) =

[3−1

]and B(2) =

[−2−3

].


www.tensorlet.com



A ∗ ~B = fold

1 0 −2 10 2 −2 7−1 3 0 −1−2 1 1 0−2 7 0 20 −1 −1 3

3−1−2−3

= fold

4−19−3−9−19−6

∈ R3×1×2

is a 3× 1× 2 tensor. In other words, in this example, ~C := A ∗ ~B is a 3× 2 matrix, oriented asa lateral slice of a third-order tensor.


www.tensorlet.com



t-Linear combinations

Given k tubal scalars ~cj ∈ R1×1×n, j = 1, 2, · · · , k , a t-linear combination of~Xj ∈ Rm×1×n, j = 1, 2, · · · , k is defined as

~X1 ∗~c1 + ~X2 ∗~c2 + · · ·+ ~Xk ∗~ck ≡ X ∗ ~C

where

X :=[~X1, ~X2, · · · , ~Xk

], ~C :=

~c1~c2...~ck

.

Example. Using A ∈ R3×2×2 and ~B ∈ R2×1×2 from the previous example, we see that


www.tensorlet.com



A ∗ ~B = ~A1 ∗ ~b11 + ~A2 ∗ ~b21

= fold

74−3−8−62

+ fold

−3−23

0−1−13−8

= fold

4−19−3−9−19−6

Thus, ~C := A ∗ ~B is a t-linear combination of the lateral slices of A.


www.tensorlet.com



Observation

Given ~a, ~b ∈ R1×1×n, ~a ∗ ~b can be computed as

~a ∗ ~b := ifft(~a } ~b, [], 3),

where } of two tubal scalars means pointwise multiplication.Factorizations of A are created (implicitly) by applying the appropriate matrix factorization

to each of the A(i)

A = Q ∗R ⇐⇒ A(i) = Q(i)R(i).


www.tensorlet.com



t-SVD

A = U ∗ S ∗ VT =

min(l,m)∑i=1

~Ui ∗ si ∗ ~VTi , si := S(i , i , :)

The t-SVD of an l ×m × n tensorwww.tensorlet.com 33/51

www.tensorlet.com


Tubal-tensor Sparse Coding

Tubal-tensor Linear Combination

A two-dimensional image of size m × k is represented by a third-order tensor X ∈ Rm×1×k ,which can be approximated by the t-product between D ∈ Rm×r×k and C ∈ Rr×1×k as

X = D ∗ C= D(:, 1, :) ∗ C(1, 1, :) +D(:, 2, :) ∗ C(2, 1, :) + · · ·+D(:, r , :) ∗ C(r , 1, :)

Tubal-tensor sparse coding model is based on circular convolution operation


www.tensorlet.com



Tubal-tensor Sparse Representation

A third-order tensor X ∈ Rm×n×k is presented by n images of size m × k. Let D ∈ Rm×r×k

be the tensor dictionary, where each lateral slice D(:, j , :) represents a tensor basis, andC ∈ Rr×n×k be the tensor corresponding representations. Each image X (:, j , :) is approximatedby a sparse t-linear combination of those tensor bases. Tubal-tensor sparse coding (TubSC)model can be formulated as

minD,C

1

2‖X − D ∗ C‖2F + β‖C‖1

s.t. ‖D(:, j , :)‖2F ≤ 1, j = 1, 2, · · · , r .

TubSC model can be solved alternately by tensor coefficients learning and tensordictionary learning.


www.tensorlet.com



Tensor Coefficients Learning

minC

1

2‖X − D ∗ C‖2F + β‖C‖1

According to low-tubal-tensor model, the problem can be transformed to

minunfold(C)

1

2‖unfold(X )− bcirc(D) · unfold(C)‖2F + β‖unfold(C)‖1.

It can be solved by Iterative Shrinkage Thresholding algorithm based on Tensor (ISTT), whichis rewritten as

minC

f (C) + βg(C),

where f (C) = 12‖X − D ∗ C‖

2F , and g(C) = ‖C‖1.


www.tensorlet.com



Tensor Dictionary Learning

minD

1

2‖X − D ∗ C‖2F

s.t. ‖D(:, j , :)‖2F ≤ 1, j = 1, 2, · · · , r .

We transform this problem into the frequecy domain:

minD(l)

k∑l=1

‖X (l) − D(l)C(l)‖2F , l = 1, 2, · · · , k

s.t.k∑

l=1

‖D(l)(:, j)‖2F ≤ k , j = 1, 2, · · · , r .

Then adopt the Lagrange dual (Lee et al. 2007) to solve the dual variables by Newtonsalgorithm.


www.tensorlet.com


Agenda

I Background





www.tensorlet.com


cuTensor-tubal (GPU)This library is a general approach to computelow-tubal-rank tensor operations in the frequencydomain on GPUs.

1. Obtain the frequency domain representation ofthe input tensor by performing Fourier transformalong the third dimension (called tube-wise DFT)on the GPU;

2. In the frequency domain, the tensor operationsare separated into multiple independent complexmatrix computations that possess strongparallelism;

3. Converting the frequency domain results back tothe time domain through inverse Fouriertransform along the third dimension on the GPU(called tube-wise inverse DFT).

System architecture of the cuTensor-tuballibrary


www.tensorlet.com


cuTensor-tubal (GPU)

Operation Input Output

t-FFT A ∈ Rm×n×k A ∈ Cm×n×k

inverse t-FFT A ∈ Cm×n×k A ∈ Rm×n×k

t-product A ∈ Rm×l×k ,B ∈ Rl×n×k C ∈ Rm×n×k

t-SVD T ∈ Rm×n×k U ∈ Rm×m×k ,V ∈ Rn×n×k ,× ∈ Rm×n×k

t-QR T ∈ Rm×n×k Q ∈ Rm×m×k ,R ∈ Rm×n×k

t-inverse T ∈ Rn×n×k T −1 ∈ Rn×n×k

t-normalization T ∈ Rm×1×k T ∈ Rm×1×k

Table: Seven tensor operations in the cuTensor-tubal library


www.tensorlet.com



Key Challenges

I Data Transfer Between the CPU and GPU

I Alternative Access to Tube and Slice Data Structures

I Parallelizing the Fourier Transforms and Matrix Computations


www.tensorlet.com


cuTensor-tubal (GPU)Efficient Data Transfer

Overlapping data transfer with computations


www.tensorlet.com


cuTensor-tubal (GPU)Uniform Memory Access to Tube and Slice Data Structures

Tensors are stored as a 1D array in memory Data structures in tensor computations


www.tensorlet.com


cuTensor-tubal (GPU)Parallelizing the Fourier Transforms and Matrix Computations

Operation #(FFT operations) #(Matrix Operation) #(inverse FFT operation)

t-FFT m × n None m × ninverse t-FFT m × n None m × nt-product m × l + l × n k m × nt-SVD m × n k m × n + n × n + m × nt-QR m × n k m ×m + m × nt-inverse n × n k n × nt-normalization m k m

Table: Seven tensor operations in the cuTensor-tubal library


www.tensorlet.com



System workflow of the cuTensor-tubal library

Memory access operators


www.tensorlet.com


TenDeC++ (CPU)TenDeC++ is a new library for tensordecompositions in C++, in which a novel underlyingtechnology PointerDefomer leveraging he uniquepointer is proposed to further explore potentials ofC++. TenDeC++ supports

I Canonical Polyadic

I Tucker Decomposition

I Tensor-train Decomposition

I t-SVD

Compared with Tensorly in Python and TensorLabin MATLAB, TenDeC++ reduces more than83.7%, 53.3% decomposition time, and supports2.5×, 2× of tensor.

System architecture of the TenDeC++library


www.tensorlet.com


TenDeC++ (CPU)PointerDeformer

A 3D tensor is stored as a 1D array in memory. Accessing these data with different sequencescan form size-specific matrices including three mode-n views: column-major, row-major, andconcatenation views. These virtual views motivate us to design PointerDeformer to skip thetime-consuming unfolding operation in C++.


www.tensorlet.com


TenDeC++ (CPU)Optimized Basic Tensor Operation: n-mode Product

Compare with traditional process, the optimized process does not need the time-consumingunfold/fold operations. Instead, PointerDeformer achieves the virtual transformation byaccessing the data with specific sequence in memory.


www.tensorlet.com


TenDeC++ (CPU)Other Acceleration Techniques

I Exploit Symmetry with PointerDefomerI Exploit Conjugate Symmetry for t-SVD Decomposition

Based on the conjugate symmetric property of FFT for real input data, there isconj(X (j)) = X (k−j+2), where j = 2, 3, · · · , d k+1

2 e and conj(X) denotes the conjugate of

matrix X. Conjugation on both sides of X (j) = U (j)S(j)V(j),

X (k−j+2) = conj(U (j)) · conj(S(j)) · conj(V(j)).

Hence, with the conjugate symmetry, it only needs to preform SVD on half slice in t-SVDdecomposition.


www.tensorlet.com


TenDeC++ (CPU)Performance

Running time of CP decomposition Running time of Tucker decomposition


www.tensorlet.com


TenDeC++ (CPU)Performance

Running time of t-SVDwww.tensorlet.com 51/51

www.tensorlet.com

introduction to tensor - intelligent computing for ...zhang/resources/introduction_to_tensor.pdf ·...

Documents