Introduction to TensorIntelligent Computing for Computational Intelligence
in post Moores Law era!
Xiao-Yang Liuwww.tensorlet.com
April 19, 2019
1/51
Introduction to Tensor 4/19/2019
Agenda
I Background
I Tensor Decompositions (CP, Tucker, and Tensor-Train/Tensor-Ring)
I Transform-based Tensor Model and Applications
I Tensor Computations (cuTensor, TensorDeC++)
www.tensorlet.com 2/51
Introduction to Tensor 4/19/2019
BackgroundIn sensing systems, the number and resolution of the sensors grow to the point that
multidimensional data of exceedingly huge volume, variety and structural richness becomeubiquitous across disciplines in engineering and data science.
www.tensorlet.com 3/51
Introduction to Tensor 4/19/2019
BackgroundMany problems in computational neuroscience,
neuroinformatics, pattern/image recognition, signalprocessing and machine learning generate massiveamounts of multidimensional data with multipleaspects and high dimensionality. These data havefour “V” characters, as shown in the right figure.
Tensor provides a natural and compactrepresentation for such massive multidimensionaldata via suitable low-rank approximations, of whichthe dynamic analysis allows us to discovermeaningful hidden structures of complex data andto perform generalizations by capturing multi-linearand multi-aspect relationships.
Volume - scale of dataVariety - different forms of dataVeracity - uncertainty of data
Velocity - analysis of streaming datawww.tensorlet.com 4/51
Introduction to Tensor 4/19/2019
Tensor Unfolding
A(i) means mode-i unfolding.
www.tensorlet.com 9/51
Introduction to Tensor 4/19/2019
Tensor NotationsI Scalars are denoted by lowercase letters, e.g., a.
I Vectors (tensors of order one) are denoted by boldface lowercase letters, e.g., a.
I Matrices (tensors of order two) are denoted by boldface capital letters, e.g., A.
I Higher-order tensors (order three or higher) are denoted by boldface Euler script letters,e.g., X .
I The n-th element in a sequence is denoted by a superscript in parentheses, e.g., A(n)
denotes the n-th matrix in a sequence.
I “◦” represents the vector outer product.
I n-mode product of a tensor X ∈ RI1×I2×···×Id with a matrix U ∈ RJ×In is denoted byX ×n U and is of size I1 × · · · × In−1 × J × In+1 × · · · × Id ,
(X ×n U)i1···in−1jin+1···id =In∑
in=1
xi1i2···idujin .
www.tensorlet.com 10/51
Introduction to Tensor 4/19/2019
Tensor NotationsI The Kronecker product of matrices A ∈ RI×J and A ∈ RK×L is denoted by A⊗ B. The
result is a matrix of size (IK )× (JL) and defined by
A⊗ B =
a11B a12B · · · a1JBa21B a22B · · · a2JB
......
. . ....
aI1B aI2B · · · aIJB
=[a1 ⊗ b1 a1 ⊗ b2 a1 ⊗ b3 · · · aJ ⊗ bL−1 aJ ⊗ bL
].
I The Khatri-Rao product is the “matching columnwise” Kronecker product. Given matricesA ∈ RI×K and B ∈ RJ×K , their KhatriRao product is denoted by A� B. The result is amatrix of size (IJ)× K defined by
A� B =[a1 ⊗ b1 a2 ⊗ b2 · · · aK ⊗ bK
].
www.tensorlet.com 11/51
Introduction to Tensor 4/19/2019
Agenda
I Background
I Tensor Decompositions (CP, Tucker, and Tensor-Train/Tensor-Ring)
I Transform-based Tensor Model and Applications
I Tensor Computations (cuTensor, TensorDeC++)
www.tensorlet.com 12/51
Introduction to Tensor 4/19/2019
Rank-One TensorAn N-way tensor X ∈ RI1×I2×···×Id is rank one if it can be written as the outer product of d
vectors:X = a(1) ◦ a(2) ◦ · · · ◦ a(d).
example. Rank-one third-order tensor, X = a ◦ b ◦ c . The (i , j , k) element of X is given byxijk = aibjck .
www.tensorlet.com 13/51
Introduction to Tensor 4/19/2019
CP Decomposition
X ≈R∑
r=1
λrar ◦ br ◦ cr
= Λ×1 A×2 B×3 C
= [[Λ; A,B,C]]
X(1) = AΛ(C� B)T + E(1)
X(2) = BΛ(C� A)T + E(2)
X(3) = CΛ(B� A)T + E(3)
www.tensorlet.com 14/51
Introduction to Tensor 4/19/2019
CP Decomposition
Name Proposed by
Polyadic form of a tensor Hitchcock, 1927PARAFAC (parallel factors) Harshman, 1970CANDECOMP or CAND (canonical decomposition) Carroll and Chang, 1970Topographic components model Mocks, 1988CP (CANDECOMP/PARAFAC) Kiers, 2000
Table: Some of the many names for the CP decomposition.
www.tensorlet.com 15/51
Introduction to Tensor 4/19/2019
Tucker Decomposition
Y = G ×1 A×2 B×3 C + E = [[G; A,B,C]] + E
X(1) ≈ AG(1)(C⊗ B)T
X(2) ≈ BG(2)(C⊗ A)T
X(3) ≈ CG(3)(B⊗ A)T
www.tensorlet.com 16/51
Introduction to Tensor 4/19/2019
Tucker Decomposition
Name Proposed by
Three-mode factor analysis (3MFA/Tucker3) Tucker, 1966Three-mode PCA (3MPCA) Kroonenberg and De Leeuw, 1980N-mode PCA Kapteyn et al., 1986Higher-order SVD (HOSVD) De Lathauwer et al., 2000N-mode SVD Vasilescu and Terzopoulos, 2002
Table: Names for the Tucker decomposition (some specific to three-way and some for N-way).
www.tensorlet.com 17/51
Introduction to Tensor 4/19/2019
Tensor Train Decomposition
TT Form
Ai1i2···id = G1(i1)︸ ︷︷ ︸1×R
G2(i2)︸ ︷︷ ︸R×R
· · · Gd(id)︸ ︷︷ ︸R×1
A graphical representation of the tensor train decomposition
www.tensorlet.com 18/51
Introduction to Tensor 4/19/2019
Tensor Ring DecompositionExample.
www.tensorlet.com 19/51
Introduction to Tensor 4/19/2019
Tensor Ring Decomposition
TR Form
Ai1i2···id = Tr{Z1(i1)Z2(i2) · · · Zd(id)} = Tr
{d∏
k=1
Zk(ik)
}
A graphical representation of the tensor ring decompositionwww.tensorlet.com 20/51
Introduction to Tensor 4/19/2019
Agenda
I Background
I Tensor Decompositions (CP, Tucker, and Tensor-Train/Tensor-Ring)
I Transform-based Tensor Model and Applications
I Tensor Computations (cuTensor, TensorDeC++)
www.tensorlet.com 21/51
Introduction to Tensor 4/19/2019
Transform-based Model
Basic Operators
The operator matview(·) takes a tensor A ∈ Cn1×n2×n3×n4 and returns an n1n3n4 × n2n3n4block diagonal matrix, with each block being an n1 × n2 matrix, defined as
matview(A) = diag(A1, · · · ,Ap, · · · ,AP), p ∈ [P],
andAp(i , j) = A(i , j , k , l), p = (l − 1)n3 + k , i ∈ [n1], j ∈ [n2], k ∈ [n3], l ∈ [n4],
where P = n3n4, and [n] denotes the index set {1, 2, · · · , n}. The operator tenview(·) foldsmatview(A) back to tensor A, i.e.,
tenview(matview(A)) = A.
www.tensorlet.com 22/51
Introduction to Tensor 4/19/2019
Transform-based Model
Basic Operators
Given two fourth-order tensors A ∈ Cn1×n′×n3×n4 and B ∈ Cn′×n2×n3×n4 , the correspondingp-th matrices are Ap ∈ Cn1×n′ and Bp ∈ Cn′×n2 , and their multiplication is well-defined asCp = ApBp ∈ Cn1×n2 . Later in the transform domain, we will need the following matrixmultiplication of two block diagonal matrices, e.g.,
matview(C) = matview(A) · matview(B),
where · denotes the conventional matrix multiplication.
www.tensorlet.com 23/51
Introduction to Tensor 4/19/2019
Transform-based ModelTensor-scalar multiplication
Given an invertible 2D discrete transform L : C1×1×n3×n4 → C1×1×n3×n4 , the element-wisemultiplication ◦, and α, β ∈ C1×1×n3×n4 , we define the tensor-scalar multiplication
α • β , L−1(L(α) ◦ L(β)),
where L−1 : C1×1×n3×n4 → C1×1×n3×n4 is the inverse transform.
Tensor-linear combinations
Given tensor scalars cj ∈ C1×1×n3×n4 , j ∈ [n2], a tensor-linear combination of thetensor-columns Aj ∈ Cn1×1×n3×n4 , j ∈ [n2], is defined as
A1 • c1 + · · ·+An2 • cn2 = A • c,
where A = [A1, · · · ,An2 ] and c = [c1, · · · , cn2 ]T .
www.tensorlet.com 24/51
Introduction to Tensor 4/19/2019
Transform-based Model
L-product
The L-product C = A•B ∈ Cn1×n2×n3×n4 of A ∈ Cn1×n′×n3×n4 and Cn′×n2×n3×n4 is defined as
C(i , j) =∑k∈[n′]
A(i , k) • B(k , j), i ∈ [n1], j ∈ [n2].
Lemma. The L-product C = A • B can be calculated in the following way: First, we compute
matview(C) = matview(A) · matview(B).
Then, we stack matview(C) back to tensor tenview(matview(C)) and perform the inverse
transform to get C, i.e., C = L−1(C). The notation A denotes the transform-domain
representation of A ∈ Cn1×n2×n3×n4 such that A = L(A) and A = L−1(A).
www.tensorlet.com 25/51
Introduction to Tensor 4/19/2019
Transform-based ModelSo the L-product can be considered as
L(C(i , j)) =∑k∈[n′]
L(A(i , k)) ◦ L(B(k, j)),
which can be represented as Cp = ApBp, p ∈ [P].
www.tensorlet.com 26/51
Introduction to Tensor 4/19/2019
Low-tubal-rank ModelNotations
~Ai ≡ A(:, i , :), A(j) ≡ A(:, :, j), A := fft(A, [], 3)
bcirc(A) =
A(1) A(n) A(n−1) · · · A(2)
A(2) A(1) A(n) · · · A(3)
......
.... . .
...A(n) A(n−1) · · · A(2) A(1)
unfold(A) =
A(1)
A(2)
...A(n)
, fold(unfold(A)) = A
www.tensorlet.com 27/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
t-Product
A ∗ B = fold(bcirc(A) · unfold(B))
Example. Let A ∈ R3×2×2 with frontal faces
A(1) =
1 00 2−1 3
and A(2) =
−2 1−2 70 −1
,and let ~B ∈ R2×1×2 with frontal faces
B(1) =
[3−1
]and B(2) =
[−2−3
].
www.tensorlet.com 28/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
A ∗ ~B = fold
1 0 −2 10 2 −2 7−1 3 0 −1−2 1 1 0−2 7 0 20 −1 −1 3
3−1−2−3
= fold
4−19−3−9−19−6
∈ R3×1×2
is a 3× 1× 2 tensor. In other words, in this example, ~C := A ∗ ~B is a 3× 2 matrix, oriented asa lateral slice of a third-order tensor.
www.tensorlet.com 29/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
t-Linear combinations
Given k tubal scalars ~cj ∈ R1×1×n, j = 1, 2, · · · , k , a t-linear combination of~Xj ∈ Rm×1×n, j = 1, 2, · · · , k is defined as
~X1 ∗~c1 + ~X2 ∗~c2 + · · ·+ ~Xk ∗~ck ≡ X ∗ ~C
where
X :=[~X1, ~X2, · · · , ~Xk
], ~C :=
~c1~c2...~ck
.
Example. Using A ∈ R3×2×2 and ~B ∈ R2×1×2 from the previous example, we see that
www.tensorlet.com 30/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
A ∗ ~B = ~A1 ∗ ~b11 + ~A2 ∗ ~b21
= fold
74−3−8−62
+ fold
−3−23
0−1−13−8
= fold
4−19−3−9−19−6
Thus, ~C := A ∗ ~B is a t-linear combination of the lateral slices of A.
www.tensorlet.com 31/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
Observation
Given ~a, ~b ∈ R1×1×n, ~a ∗ ~b can be computed as
~a ∗ ~b := ifft(~a } ~b, [], 3),
where } of two tubal scalars means pointwise multiplication.Factorizations of A are created (implicitly) by applying the appropriate matrix factorization
to each of the A(i)
A = Q ∗R ⇐⇒ A(i) = Q(i)R(i).
www.tensorlet.com 32/51
Introduction to Tensor 4/19/2019
Low-tubal-rank Model
t-SVD
A = U ∗ S ∗ VT =
min(l,m)∑i=1
~Ui ∗ si ∗ ~VTi , si := S(i , i , :)
The t-SVD of an l ×m × n tensorwww.tensorlet.com 33/51
Introduction to Tensor 4/19/2019
Tubal-tensor Sparse Coding
Tubal-tensor Linear Combination
A two-dimensional image of size m × k is represented by a third-order tensor X ∈ Rm×1×k ,which can be approximated by the t-product between D ∈ Rm×r×k and C ∈ Rr×1×k as
X = D ∗ C= D(:, 1, :) ∗ C(1, 1, :) +D(:, 2, :) ∗ C(2, 1, :) + · · ·+D(:, r , :) ∗ C(r , 1, :)
Tubal-tensor sparse coding model is based on circular convolution operation
www.tensorlet.com 34/51
Introduction to Tensor 4/19/2019
Tubal-tensor Sparse Coding
Tubal-tensor Sparse Representation
A third-order tensor X ∈ Rm×n×k is presented by n images of size m × k. Let D ∈ Rm×r×k
be the tensor dictionary, where each lateral slice D(:, j , :) represents a tensor basis, andC ∈ Rr×n×k be the tensor corresponding representations. Each image X (:, j , :) is approximatedby a sparse t-linear combination of those tensor bases. Tubal-tensor sparse coding (TubSC)model can be formulated as
minD,C
1
2‖X − D ∗ C‖2F + β‖C‖1
s.t. ‖D(:, j , :)‖2F ≤ 1, j = 1, 2, · · · , r .
TubSC model can be solved alternately by tensor coefficients learning and tensordictionary learning.
www.tensorlet.com 35/51
Introduction to Tensor 4/19/2019
Tubal-tensor Sparse Coding
Tensor Coefficients Learning
minC
1
2‖X − D ∗ C‖2F + β‖C‖1
According to low-tubal-tensor model, the problem can be transformed to
minunfold(C)
1
2‖unfold(X )− bcirc(D) · unfold(C)‖2F + β‖unfold(C)‖1.
It can be solved by Iterative Shrinkage Thresholding algorithm based on Tensor (ISTT), whichis rewritten as
minC
f (C) + βg(C),
where f (C) = 12‖X − D ∗ C‖
2F , and g(C) = ‖C‖1.
www.tensorlet.com 36/51
Introduction to Tensor 4/19/2019
Tubal-tensor Sparse Coding
Tensor Dictionary Learning
minD
1
2‖X − D ∗ C‖2F
s.t. ‖D(:, j , :)‖2F ≤ 1, j = 1, 2, · · · , r .
We transform this problem into the frequecy domain:
minD(l)
k∑l=1
‖X (l) − D(l)C(l)‖2F , l = 1, 2, · · · , k
s.t.k∑
l=1
‖D(l)(:, j)‖2F ≤ k , j = 1, 2, · · · , r .
Then adopt the Lagrange dual (Lee et al. 2007) to solve the dual variables by Newtonsalgorithm.
www.tensorlet.com 37/51
Introduction to Tensor 4/19/2019
Agenda
I Background
I Tensor Decompositions (CP, Tucker, and Tensor-Train/Tensor-Ring)
I Transform-based Tensor Model and Applications
I Tensor Computations (cuTensor, TensorDeC++)
www.tensorlet.com 38/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)This library is a general approach to computelow-tubal-rank tensor operations in the frequencydomain on GPUs.
1. Obtain the frequency domain representation ofthe input tensor by performing Fourier transformalong the third dimension (called tube-wise DFT)on the GPU;
2. In the frequency domain, the tensor operationsare separated into multiple independent complexmatrix computations that possess strongparallelism;
3. Converting the frequency domain results back tothe time domain through inverse Fouriertransform along the third dimension on the GPU(called tube-wise inverse DFT).
System architecture of the cuTensor-tuballibrary
www.tensorlet.com 39/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)
Operation Input Output
t-FFT A ∈ Rm×n×k A ∈ Cm×n×k
inverse t-FFT A ∈ Cm×n×k A ∈ Rm×n×k
t-product A ∈ Rm×l×k ,B ∈ Rl×n×k C ∈ Rm×n×k
t-SVD T ∈ Rm×n×k U ∈ Rm×m×k ,V ∈ Rn×n×k ,× ∈ Rm×n×k
t-QR T ∈ Rm×n×k Q ∈ Rm×m×k ,R ∈ Rm×n×k
t-inverse T ∈ Rn×n×k T −1 ∈ Rn×n×k
t-normalization T ∈ Rm×1×k T ∈ Rm×1×k
Table: Seven tensor operations in the cuTensor-tubal library
www.tensorlet.com 40/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)
Key Challenges
I Data Transfer Between the CPU and GPU
I Alternative Access to Tube and Slice Data Structures
I Parallelizing the Fourier Transforms and Matrix Computations
www.tensorlet.com 41/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)Efficient Data Transfer
Overlapping data transfer with computations
www.tensorlet.com 42/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)Uniform Memory Access to Tube and Slice Data Structures
Tensors are stored as a 1D array in memory Data structures in tensor computations
www.tensorlet.com 43/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)Parallelizing the Fourier Transforms and Matrix Computations
Operation #(FFT operations) #(Matrix Operation) #(inverse FFT operation)
t-FFT m × n None m × ninverse t-FFT m × n None m × nt-product m × l + l × n k m × nt-SVD m × n k m × n + n × n + m × nt-QR m × n k m ×m + m × nt-inverse n × n k n × nt-normalization m k m
Table: Seven tensor operations in the cuTensor-tubal library
www.tensorlet.com 44/51
Introduction to Tensor 4/19/2019
cuTensor-tubal (GPU)
System workflow of the cuTensor-tubal library
Memory access operators
www.tensorlet.com 45/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)TenDeC++ is a new library for tensordecompositions in C++, in which a novel underlyingtechnology PointerDefomer leveraging he uniquepointer is proposed to further explore potentials ofC++. TenDeC++ supports
I Canonical Polyadic
I Tucker Decomposition
I Tensor-train Decomposition
I t-SVD
Compared with Tensorly in Python and TensorLabin MATLAB, TenDeC++ reduces more than83.7%, 53.3% decomposition time, and supports2.5×, 2× of tensor.
System architecture of the TenDeC++library
www.tensorlet.com 46/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)PointerDeformer
A 3D tensor is stored as a 1D array in memory. Accessing these data with different sequencescan form size-specific matrices including three mode-n views: column-major, row-major, andconcatenation views. These virtual views motivate us to design PointerDeformer to skip thetime-consuming unfolding operation in C++.
www.tensorlet.com 47/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)Optimized Basic Tensor Operation: n-mode Product
Compare with traditional process, the optimized process does not need the time-consumingunfold/fold operations. Instead, PointerDeformer achieves the virtual transformation byaccessing the data with specific sequence in memory.
www.tensorlet.com 48/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)Other Acceleration Techniques
I Exploit Symmetry with PointerDefomerI Exploit Conjugate Symmetry for t-SVD Decomposition
Based on the conjugate symmetric property of FFT for real input data, there isconj(X (j)) = X (k−j+2), where j = 2, 3, · · · , d k+1
2 e and conj(X) denotes the conjugate of
matrix X. Conjugation on both sides of X (j) = U (j)S(j)V(j),
X (k−j+2) = conj(U (j)) · conj(S(j)) · conj(V(j)).
Hence, with the conjugate symmetry, it only needs to preform SVD on half slice in t-SVDdecomposition.
www.tensorlet.com 49/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)Performance
Running time of CP decomposition Running time of Tucker decomposition
www.tensorlet.com 50/51
Introduction to Tensor 4/19/2019
TenDeC++ (CPU)Performance
Running time of t-SVDwww.tensorlet.com 51/51