![Page 1: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/1.jpg)
TensorsoContravariant and covariant tensorsoEinstein notationo Interpretation
![Page 2: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/2.jpg)
What is a Tensor?§ It is the generalization of a:
– Matrix operator for >2 dimensions
– Vector data to >1 dimension
§ Why do we need it?– Came out of differential geometry.– Was used by Albert Einstein to formulate the theory of
General Relativity.– It’s needed for many problems, including Deep NNs.
!inputoutput
or
"#"$
![Page 3: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/3.jpg)
Contravariant and Covariant Vectors
§ Contravariant vectors:– “Column Vectors” that represent data– Vectors that describe the position of something– "# for 0 ≤ & < ( and ) ∈ ℜ
§ Covariant vector:– “Row vectors” that operate on data– Gradient vectors– ,# for 0 ≤ & < (
§ Einstein notation
) = ,# "# = .#/0
123,# "#
– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice
) = ,"
, is 1×(
"is (×
1
) =
Picture of multiplication
contravariantcovariant
![Page 4: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/4.jpg)
Vector-Matrix Products as Tensors
§ 1D Contravariant vectors:– "# for 0 ≤ & < ()– *# for 0 ≤ & < (+
§ 2D tensor (i.e., matrix):– ,-# for 0 ≤ . < (+ and 0 ≤ & < ()– There is a gradient for each component *-
§ Einstein notation
*- = ,-# "# = 0#12
3456
,-# "#
– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice
* = ,"
, is (×(
"is (×1
=
Picture of multiplication
contravariantcovariant
rows
*is (×1
![Page 5: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/5.jpg)
Tensor Products
§ Einstein notation!"#,"% = '"#,"%(#,(%)(#,(%
§ 2-D Contravariant vectors:– )(#,(% for 0 ≤ -., -/ < 12– !"#,"% for 0 ≤ 3., 3/ < 14
§ 4-D Tensor:– '"#,"%(#,(%for 0 ≤ 3., 3/ < 14 and 0 ≤ -., -/ < 12– ' is known as a tensor• 2D covariant input• 2D contravariant output
Sum over pairs of indices
![Page 6: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/6.jpg)
Picture of a Tensor Product
§ For example, if we have
– Tensor 3-D tensor ! has
– General idea: ! ∈ ℜ$%×$'
() = !)+,,+. /+,,+.
Input: 2D image indexed by 01, 02Output: 1D vector indexed by 3
/
4
!
=
(4
contravariant 2D tensor
3D tensor
![Page 7: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/7.jpg)
Some Useful Definitions§ Delta functions:
!"# = %1 if ) = *0 if ) ≠ *
– So then we have that-.# = -." !"#
§Gradient w.r.t. vector∇0 12 = 1
– In tensor notation, we have that
∇0 12"# = 1"#
§Gradient w.r.t. matrix∇3 12 =? ?
– First, it must have 1 output dimension and 2 input dimensions. So we have that
∇3 12 "#5,#7 = !"#52#7
![Page 8: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/8.jpg)
GD for Single Layer NNoStructure of the GradientoGradients for NN parametersoUpdates for NN parameters
![Page 9: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/9.jpg)
Gradient Direction for Single Layer NN§ Single layer NN:
– We will need gradient w.r.t the parameters ! = #, %, & :
– Later, we will also need:
∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *
∇.)( *
#* /0*
&
1 ⋅+ %)( * = %1 #* + &
# %
![Page 10: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/10.jpg)
Gradient Structure for Single Layer NN
§ Single layer NN:– Parameters are ! = #, %, &– We will need the parameter gradients:
– And the input gradient:
∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *
∇.)( *
/ 01/* +
/#
∇+)+
/&
∇,)+
/%
∇-)∇.)
![Page 11: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/11.jpg)
Gradient w.r.t. !
§ For this case,
– Using Einstein notation
∇#$ = ∇#$& ' = ( ∇) *' + , *
!" #$"
%
& ⋅+ ()* " = (& !" + %
! (
∇#$-. = (--/ ∇) -/-0*
-0.
This is a matrix or 2-D tensor
![Page 12: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/12.jpg)
Gradient w.r.t. !
§ For this case,
– Using Einstein notation, since
– Then
∇#$ = & ∇' () + + ∇# ()
!" #$"
%
& ⋅+ ()* " = (& !" + %
! (
This is a tensor!
∇#$ ,-.-,.0 = &,-,0 ∇' ,0,1 ∇# () ,1.-,.0= &,-,0 ∇' ,0,12
,1.-).0
∇#$ ,-.-,.0 = &,-,0 ∇' ,0.- ).0
∇# () ,.-,.0 = 2,.-).0
34
![Page 13: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/13.jpg)
Gradient w.r.t. !
§ For this case,
– Using Einstein notation,
∇#$ = & ∇' () + + ∇# () + += & ∇' , = & ∇'
!" #$"
%
& ⋅+ ()* " = (& !" + %
! (
∇#$ -./. = &-.-0 ∇' -0/.
![Page 14: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/14.jpg)
Gradient w.r.t. !
§ For this case,
– Using Einstein notation, since
– We have that
∇#$ = ∇# &'
!" #$"
%
& ⋅+ ()* " = (& !" + %
! (
∇($ )*+*,+- = .)*+*'+-
This is a tensor!
∇# &' )+*,+- = .)+*'+-
![Page 15: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/15.jpg)
Update Direction for !
where " is given by
Gradient step:# ← # + &"'
"(),(+ = −∇/0 1 = 234567
89:;5,<) =<)<+ ∇>5 <+() ?5
(+
For efficiency, computation goes this way!
234567
89:1×BC;5
BC×B:=5
=B:×B:∇>5 B:
![Page 16: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/16.jpg)
Update Direction for !
where " is given by
Gradient step:# ← # + &"'
"() = −∇-. / = 212345
67893,;) <;);= ∇>3 ;=()
For efficiency, computation goes this way!
212345
6781×AB93
AB×A8<3
= A8×1A8×A8∇>3
![Page 17: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/17.jpg)
Update Direction for !
where " is given by
Gradient step:# ← # + &"'
"(),(+ = −∇/0 1 = 234567
89:;5,()<5(+
For efficiency, computation goes this way!
234567
89:1×?@;5 =<5 ?@
![Page 18: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/18.jpg)
Local and Global MinimaoOpen and Closed SetsoConvex Sets and FunctionsoProperties of Convex FunctionsoLocal Minimum, Saddle Points, and Global MinimaoOptimization Theorems
![Page 19: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/19.jpg)
Open and Closed Sets
§ Define:– " ⊂ ℜ%– Open ball of diameter & is ' (, & = ( ∈ ℜ%: ( − (. < & .
§ A set " is open if – Ateverypoint,thereisanopenballcontainedin".– ∀( ∈ ", ∃& > 0 s.t. ' (, & ⊂ ".
§ A set " is closed if "G = ℜ% −" is open.
§ A set " is compact if it is closed and bounded.
§Facts:– ℜ% is both open and closed, but it is not compact.– If " is compact, than every sequence in " has a limit point in ".
![Page 20: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/20.jpg)
Convexity Sets
§ A set ! is convex if∀# ∈ 0,1 , ∀(, ) ∈ !, then #) + 1 − # ( ∈ !
§ Properties:– The intersection of convex sets is convex
Line connecting points is sometimes outside set
Nonconvex Set
Line connecting points is always in set
Convex Set
![Page 21: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/21.jpg)
Convexity Functions§ Let !: # → ℜ where # is a convex set. Then we say that ! is a convex function if
∀' ∈ 0,1 , ∀,, - ∈ #, then ! '- + 1 − ' , ≤ '! - + 1 − ' ! ,
§ Properties:– The sums of convex functions is convex– The maximum of a set of convex functions is convex– If ! 1 is convex, then ! 21 is also convex.– ! 1 is concave if −! 1 is convex. – ! 1 = 21 + 5 is both convex and concave
6- ,Convex Function
Line is always above function
6- ,Nonconvex Function
Line is sometimes below function
![Page 22: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/22.jpg)
Properties of Convex Functions§ Valuable properties of convex functions• The sums of convex functions is convex.
– Let ! " = ∑% !% " . If !% " are convex, then ! " is convex.
• The maximum of a set of convex functions is convex.– Let ! " = max
%!% " . If !% " are convex, then ! " is convex.
• The second derivative of a convex function is positive.
– Let ! ) have two continuous derivatives, and let *+,- ) = ./0.12
be the
Jacobian of ! at ). Then ! ) is convex if and only if * ) is non-negative definite for all ).
• A convex function of a linear transform is convex.– If ! ) is convex, then ! 3) is also convex.
• An affine transform is both convex and concave.– ! ) = 3) + 6 is both convex and concave.
• A function is concave if its negative is convex.– ! ) is concave if −! ) is convex.
![Page 23: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/23.jpg)
Local Minimum§ Let !:# → ℜ where # ⊂ ℜ'.
– We say that () ∈ # is a local minimum of ! if there ∃, > 0 s.t. ∀ ( ∈ # ∩ 1 (, , , ! ( ≥ ! () .
• Necessary condition for local minima:– Let ! be continuously differentiable and let () be a local
minimum, then ∇! () = 0.
()
local minimum
() a local minimum ∇! () = 0
![Page 24: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/24.jpg)
Saddle Point§ Let !:# → ℜ where # ⊂ ℜ' be a continuously differentiable function.
– We say that () ∈ # is a saddle point of ! if ∇! () = 0 and () is not a local minimum.
– Saddle points can cause more problems than you might think.
()
saddle point in 1D
*Shared from Wikipedia
saddle point in 2D*
Graph of ! . = ./0 − .00
![Page 25: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/25.jpg)
Global Minimum§ Let !:# → ℜ where # ⊂ ℜ'.
– We say that () ∈ # is a global minimum of ! if ∀ ( ∈ #, ! ( ≥ ! () .
§ Comments:– In general, finding global minimum is difficult.– Gradient descent optimization typically becomes trapped in local minima.
()
global minimum
GD steps
Local minimum
![Page 26: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came](https://reader031.vdocuments.net/reader031/viewer/2022040322/5e5df4befc38fa521639a7e1/html5/thumbnails/26.jpg)
Optimization Theorems§ Let !:# → ℜ where # ⊂ ℜ'.
– If ! is continuous and # is compact, then ! takes on a global minimum in #.– If ! is convex on #, then any local minimum is a global minimum.– If ! is continuously differentiable and convex on #, then ∇! )* = 0 implies that )* ∈# is a global minimum of !.
§ Important facts:– Global minimum may not be unique.– If # is closed but not bounded, then it may not take on a global minimum.– Generally speaking, gradient descent algorithms converge to the global minimum of
continuously differentiable convex functions.– Most interesting functions in ML are not convex!
Convex Function
GD steps global minimum