Download - Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

TensorsoContravariant and covariant tensorsoEinstein notationo Interpretation

What is a Tensor?§ It is the generalization of a:

– Matrix operator for >2 dimensions

– Vector data to >1 dimension

§ Why do we need it?– Came out of differential geometry.– Was used by Albert Einstein to formulate the theory of

General Relativity.– It’s needed for many problems, including Deep NNs.

!inputoutput

or

"#"$

Contravariant and Covariant Vectors

§ Contravariant vectors:– “Column Vectors” that represent data– Vectors that describe the position of something– "# for 0 ≤ & < ( and ) ∈ ℜ

§ Covariant vector:– “Row vectors” that operate on data– Gradient vectors– ,# for 0 ≤ & < (

§ Einstein notation

) = ,# "# = .#/0

123,# "#

– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice

) = ,"

, is 1×(

"is (×

1

) =

Picture of multiplication

contravariantcovariant

Vector-Matrix Products as Tensors

§ 1D Contravariant vectors:– "# for 0 ≤ & < ()– *# for 0 ≤ & < (+

§ 2D tensor (i.e., matrix):– ,-# for 0 ≤ . < (+ and 0 ≤ & < ()– There is a gradient for each component *-

§ Einstein notation

*- = ,-# "# = 0#12

3456

,-# "#

– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice

* = ,"

, is (×(

"is (×1

=

Picture of multiplication

contravariantcovariant

rows

*is (×1

Tensor Products

§ Einstein notation!"#,"% = '"#,"%(#,(%)(#,(%

§ 2-D Contravariant vectors:– )(#,(% for 0 ≤ -., -/ < 12– !"#,"% for 0 ≤ 3., 3/ < 14

§ 4-D Tensor:– '"#,"%(#,(%for 0 ≤ 3., 3/ < 14 and 0 ≤ -., -/ < 12– ' is known as a tensor• 2D covariant input• 2D contravariant output

Sum over pairs of indices

Picture of a Tensor Product

§ For example, if we have

– Tensor 3-D tensor ! has

– General idea: ! ∈ ℜ$%×$'

() = !)+,,+. /+,,+.

Input: 2D image indexed by 01, 02Output: 1D vector indexed by 3

/

4

!

=

(4

contravariant 2D tensor

3D tensor

Some Useful Definitions§ Delta functions:

!"# = %1 if ) = *0 if ) ≠ *

– So then we have that-.# = -." !"#

§Gradient w.r.t. vector∇0 12 = 1

– In tensor notation, we have that

∇0 12"# = 1"#

§Gradient w.r.t. matrix∇3 12 =? ?

– First, it must have 1 output dimension and 2 input dimensions. So we have that

∇3 12 "#5,#7 = !"#52#7

GD for Single Layer NNoStructure of the GradientoGradients for NN parametersoUpdates for NN parameters

Gradient Direction for Single Layer NN§ Single layer NN:

– We will need gradient w.r.t the parameters ! = #, %, & :

– Later, we will also need:

∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *

∇.)( *

#* /0*

&

1 ⋅+ %)( * = %1 #* + &

# %

Gradient Structure for Single Layer NN

§ Single layer NN:– Parameters are ! = #, %, &– We will need the parameter gradients:

– And the input gradient:

∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *

∇.)( *

/ 01/* +

/#

∇+)+

/&

∇,)+

/%

∇-)∇.)

Gradient w.r.t. !

§ For this case,

– Using Einstein notation

∇#$ = ∇#$& ' = ( ∇) *' + , *

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇#$-. = (--/ ∇) -/-0*

-0.

This is a matrix or 2-D tensor

Gradient w.r.t. !

§ For this case,

– Using Einstein notation, since

– Then

∇#$ = & ∇' () + + ∇# ()

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

This is a tensor!

∇#$ ,-.-,.0 = &,-,0 ∇' ,0,1 ∇# () ,1.-,.0= &,-,0 ∇' ,0,12

,1.-).0

∇#$ ,-.-,.0 = &,-,0 ∇' ,0.- ).0

∇# () ,.-,.0 = 2,.-).0

34

Gradient w.r.t. !

§ For this case,

– Using Einstein notation,

∇#$ = & ∇' () + + ∇# () + += & ∇' , = & ∇'

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇#$ -./. = &-.-0 ∇' -0/.

Gradient w.r.t. !

§ For this case,

– Using Einstein notation, since

– We have that

∇#$ = ∇# &'

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇($ )*+*,+- = .)*+*'+-

This is a tensor!

∇# &' )+*,+- = .)+*'+-

Update Direction for !

where " is given by

Gradient step:# ← # + &"'

"(),(+ = −∇/0 1 = 234567

89:;5,<) =<)<+ ∇>5 <+() ?5

(+

For efficiency, computation goes this way!

234567

89:1×BC;5

BC×B:=5

=B:×B:∇>5 B:


where " is given by


"() = −∇-. / = 212345

67893,;) <;);= ∇>3 ;=()


212345

6781×AB93

AB×A8<3

= A8×1A8×A8∇>3


where " is given by


"(),(+ = −∇/0 1 = 234567

89:;5,()<5(+


234567

89:1×?@;5 =<5 ?@

Local and Global MinimaoOpen and Closed SetsoConvex Sets and FunctionsoProperties of Convex FunctionsoLocal Minimum, Saddle Points, and Global MinimaoOptimization Theorems

Open and Closed Sets

§ Define:– " ⊂ ℜ%– Open ball of diameter & is ' (, & = ( ∈ ℜ%: ( − (. < & .

§ A set " is open if – Ateverypoint,thereisanopenballcontainedin".– ∀( ∈ ", ∃& > 0 s.t. ' (, & ⊂ ".

§ A set " is closed if "G = ℜ% −" is open.

§ A set " is compact if it is closed and bounded.

§Facts:– ℜ% is both open and closed, but it is not compact.– If " is compact, than every sequence in " has a limit point in ".

Convexity Sets

§ A set ! is convex if∀# ∈ 0,1 , ∀(, ) ∈ !, then #) + 1 − # ( ∈ !

§ Properties:– The intersection of convex sets is convex

Line connecting points is sometimes outside set

Nonconvex Set

Line connecting points is always in set

Convex Set

Convexity Functions§ Let !: # → ℜ where # is a convex set. Then we say that ! is a convex function if

∀' ∈ 0,1 , ∀,, - ∈ #, then ! '- + 1 − ' , ≤ '! - + 1 − ' ! ,

§ Properties:– The sums of convex functions is convex– The maximum of a set of convex functions is convex– If ! 1 is convex, then ! 21 is also convex.– ! 1 is concave if −! 1 is convex. – ! 1 = 21 + 5 is both convex and concave

6- ,Convex Function

Line is always above function

6- ,Nonconvex Function

Line is sometimes below function

Properties of Convex Functions§ Valuable properties of convex functions• The sums of convex functions is convex.

– Let ! " = ∑% !% " . If !% " are convex, then ! " is convex.

• The maximum of a set of convex functions is convex.– Let ! " = max

%!% " . If !% " are convex, then ! " is convex.

• The second derivative of a convex function is positive.

– Let ! ) have two continuous derivatives, and let *+,- ) = ./0.12

be the

Jacobian of ! at ). Then ! ) is convex if and only if * ) is non-negative definite for all ).

• A convex function of a linear transform is convex.– If ! ) is convex, then ! 3) is also convex.

• An affine transform is both convex and concave.– ! ) = 3) + 6 is both convex and concave.

• A function is concave if its negative is convex.– ! ) is concave if −! ) is convex.

Local Minimum§ Let !:# → ℜ where # ⊂ ℜ'.

– We say that () ∈ # is a local minimum of ! if there ∃, > 0 s.t. ∀ ( ∈ # ∩ 1 (, , , ! ( ≥ ! () .

• Necessary condition for local minima:– Let ! be continuously differentiable and let () be a local

minimum, then ∇! () = 0.

()

local minimum

() a local minimum ∇! () = 0

Saddle Point§ Let !:# → ℜ where # ⊂ ℜ' be a continuously differentiable function.

– We say that () ∈ # is a saddle point of ! if ∇! () = 0 and () is not a local minimum.

– Saddle points can cause more problems than you might think.

()

saddle point in 1D

*Shared from Wikipedia

saddle point in 2D*

Graph of ! . = ./0 − .00

https://commons.wikimedia.org/wiki/File:Saddle_point.svg

Global Minimum§ Let !:# → ℜ where # ⊂ ℜ'.

– We say that () ∈ # is a global minimum of ! if ∀ ( ∈ #, ! ( ≥ ! () .

§ Comments:– In general, finding global minimum is difficult.– Gradient descent optimization typically becomes trapped in local minima.

()

global minimum

GD steps

Local minimum

Optimization Theorems§ Let !:# → ℜ where # ⊂ ℜ'.

– If ! is continuous and # is compact, then ! takes on a global minimum in #.– If ! is convex on #, then any local minimum is a global minimum.– If ! is continuously differentiable and convex on #, then ∇! )* = 0 implies that )* ∈# is a global minimum of !.

§ Important facts:– Global minimum may not be unique.– If # is closed but not bounded, then it may not take on a global minimum.– Generally speaking, gradient descent algorithms converge to the global minimum of

continuously differentiable convex functions.– Most interesting functions in ML are not convex!

Convex Function

GD steps global minimum

Download - Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Top Related