center for big data analytics and discovery informatics ... · center for big data analytics and...

Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory

Fall 2018 Vasant G Honavar

Approximating Real-Valued Functions from DataNeural Networks

Vasant Honavar Artificial Intelligence Research Laboratory

Informatics Graduate ProgramComputer Science and Engineering Graduate Program

Bioinformatics and Genomics Graduate ProgramNeuroscience Graduate Program

Center for Big Data Analytics and Discovery InformaticsHuck Institutes of the Life Sciences

Institute for CyberscienceClinical and Translational Sciences Institute

Northeast Big Data HubPennsylvania State University

[email protected]://faculty.ist.psu.edu/vhonavar

http://ailab.ist.psu.edu

mailto:[email protected]

http://faculty/ist.psu.edu/vhonavar

http://ailab.ist.psu.edu



Neural Networks • Learning to approximate real-valued functions• Bayesian recipe for learning real-valued functions• Review: Learning linear functions using gradient descent

in weight space• Universal function approximation theorem• Learning nonlinear functions using gradient descent in

weight space• Practical considerations and examples



Review: Bayesian Recipe for Learning

• P (h) = prior probability of hypothesis h• P (D) = prior probability of training data D • P (h | D) = probability of h given D • P (D | h) = probability of D given h

)()()|()|(

DPhPhDPDhP =



Bayesian recipe for learning

)hypothesis likelihood (Maximum )|(maxarg),()( , If

)()|(maxarg)(

)()|(maxarg

)hypothesis posteriori a (Maximum )|(maxarg

hDPhhPhPHhh

hPhDPDP

hPhDP

DhPh

HhML

jiji

Hh

Hh

HhMAP

Î

Î

Î

Î

=

=Î"

=

=

=

Choose the most likely hypothesis given the data



Learning a Real Valued Function

• Consider a real-valued target function f • Training examples áxi, diñ , where di is noisy

training value di = f(xi) + ei• ei is random variable (noise) drawn

independently for each xi according to Gaussian distribution with zero mean

• Þ di has mean f(xi) and same variance

å=

Î-=

m

iiiHhML xhdh

1

2))((minarg

Then the maximum likelihood hypothesis hML is one that minimizes the sum of squared error:




€

hML = argmaxh∈H

P(h |D)

= argmaxh∈H

P(D | h)

= argmaxh∈H

P(di ,Xi | h)i=1

m∏

= argmaxh∈H

P(di | h,Xi )i=1

m∏ P(Xi )

= argmaxh∈H

12πσ2i=1

m∏ e

−12di−h(xi )

σ( )2

P(Xii=1

m∏ )

Maximize natural log of this instead...




å

å

å

å

=Î

=Î

=Î

=Î

-=

--=

÷øö

çèæ -

-=

÷øö

çèæ -

-=

m

iiiHh

m

iiiHh

m

i

ii

Hh

iim

iHhML

xhd

xhd

xhd

xhdh

1

2

1

2

1

2

21

2

21

12

))((minarg

))((maxarg

)(maxarg

)(21lnmaxarg

s

sps

Maximum Likelihood hypothesis is one that minimizes the mean squared error!



Approximating a linear function using a linear neuron

X0 =1

Input

Synaptic weights

n Outputy

x1

x2

xn

w2

wn

w1

! !å

W0

i

n

ii xwy å

=

=0



Learning Task

€

W = W0.........Wn[ ]T is the weight vector

Xp = X0p ....Xnp[ ]T is the pth training sample

y p = Wii∑ Xip =W • Xp is the output of the neuron for input Xp

Xp = f Xp( ) is the desired output for input Xp

ep = dp − y p( ) is the error of the neuron on input Xp

S = Xp,dp( ){ } is`a (multi) set of training examples

ES W( ) = ES W0,W1,..........Wn( ) =12

ep2

p∑ is the estimated

error of W on training set S

Goal : Find W* = argminW

ES W( )



Learning linear functions

1w

0w

ES

The error is a quadratic function of the weights in the case of a linear neuron

tW

*W



Learning linear functions

€

∂E∂wi

=12

∂∂wi

ep2

p∑$ % &

' ( )

=12

∂∂wi

ep2( )

p∑*

+ ,

-

. /

=12

2ep( )∂ep∂wip

∑*

+ ,

-

. / = ep

∂ep∂y p

*

+ ,

-

. / /

p∑

∂y p∂wi

*

+ ,

-

. / = ep −1( )

p∑

∂∂wi

w j x jpj=0

n∑

*

+ ,

-

. /

*

+ ,

-

. / /

= − dp − y p( )p∑

∂∂wi

wi xip + w j x jpj≠i∑

*

+ ,

-

. /

*

+ ,

-

. / /

= − dp − y p( )p∑

∂∂wi

wi xip( ) +∂∂wi

w j x jpj≠i∑

*

+ ,

-

. /

*

+ ,

-

. / /

= − dp − y p( )p∑ xip

€

wi ← wi − η∂E∂wi

( ) ipp

ppii xydηww å -+¬



( ) ipp

ppii xydηww å -+¬ Batch Update

( ) ipppii xydηww -+¬

Per sample Update

Least Mean Square Error (LMSE) Learning Rule



( ) ( )

( )

( )t==t

t-

=

¶

¶a-=

<a<-Da+¶

¶-=D

D+=+

åii

ii

wwi

tt

itwwi

i

iii

wE

η

twwE

ηtw

twtwtw

0

1

1

10 where)()(

)(

The momentum update allows effective learning rate to increase when feasible and decrease when necessary. Converges for 0£a<1

Momentum update



Learning approximations of nonlinear functions from data – the generalized delta rule

• Motivations• Universal function approximation theorem (UFAT)• Derivation of the generalized delta rule• Back-propagation algorithm• Practical considerations• Applications



• Psychology – Empirical inadequacy of behaviorist theories of learning – simple reward-punishment based learning models are incapable of learning functions (e.g., exclusive OR) which are readily learned by animals (e.g., monkeys)

• Artificial Intelligence – the need for learning highly nonlinearfunctions where the form of the nonlinear relationship is unknown a-priori

• Statistics – Limitations of linear regression in fitting data when the relationship is highly nonlinear and the form of the relationship is unknown

• Control – Need for nonlinear control methods

Motivations



Universal function approximation theorem (UFAT) (Cybenko, 1989)

• Let j : ÂàÂ be a non-constant, bounded (hence non-linear),

monotone, continuous function. Let IN be the N-dimensional

unit hypercube in ÂN. • Let C(IN ) = {f: IN à Â} be the set of all continuous functions

with domain IN and range Â. Then for any function f Î C(IN ) and any e > 0, $ an integer L and a sets of real values q, aj, qj, wji(1£j£L; 1£i£N ) such that

F(x1, x2...xN ) = α jj=1

L

∑ φ wjixi −θ ji=1

N

∑⎛

⎝⎜

⎞

⎠⎟−θ

is a uniform approximation of f – that is,

),...(),...( ,),...( e<-Î" NNNN xxfxxFIxx 111



Universal function approximation theorem (UFAT)

• Unlike Kolmogorov�s theorem, UFAT requires only one kind of nonlinearity to approximate any arbitrary nonlinear function to any desired accuracy

• The sigmoid function satisfies the UFAT requirements

q-÷÷ø

öççè

æq-ja= åå

==

N

ijiji

L

jjn xwxxxF

1121 )...,(

011

>+

=j-

ae

z az ;)(

Similar universal approximation properties can be guaranteed for other functions – e.g., radial basis functions

10 =j=j¥+®¥-®

)(lim ;)(lim

zzzz



• UFAT guarantees the existence of arbitrarily accurate approximations of continuous functions defined over bounded subsets of ÂN

• UFAT tells us the representational power a certain class of multi-layer networks relative to the set of continuous functions defined on bounded subsets of ÂN

• UFAT is not constructive – it does not tell us how to choose the parameters to construct a desired function

• To learn an unknown function from data, we need an algorithm to search the hypothesis space of multilayer networks

• Generalized delta rule allows nonlinear function to be learnedfrom the training data

Universal function approximation theorem



Feed-forward neural networks • A feed-forward n-layer network consists of n layers of nodes• 1 layer of Input nodes• n-2 layers of Hidden nodes• 1 layer of Output nodes• interconnected by modifiable weights from input nodes to

the hidden nodes and the hidden nodes to the output nodes• More general topologies (e.g., with connections that skip

layers, e.g., direct connections between input and output nodes) are possible



A three layer network that approximates the exclusive or function



Three-layer feed-forward neural network

• A single bias unit is connected to each unit other than the input units

• Net input

• where the subscript i indexes units in the input layer, j in the hidden; wji denotes the input-to-hidden layer weights at the hidden unit j.

• The output of a hidden unit is a nonlinear function of its net input. That is, yj = f(nj) e.g.,

å å= =

•º=+=d

i

d

ijjiijjiij wxwwxn

1 00 ,. XW

jnje

y-+

=11



Three-layer feed-forward neural network

• Each output unit similarly computes its net activation based on the hidden unit signals as:

• where the subscript k indexes units in the ouput layer andnH denotes the number of hidden units

• The output can be a linear or nonlinear function of the net input e.g.,

å å= =

•==+=H Hn

j

n

jkkjjkkjjk wywwyn

1 00 ,YW

€

zk = nk



Computing nonlinear functions



Realizing non linearly separable class boundaries



• Given a training set determine

• Network structure – number of hidden nodes or more generally,

network topology

• Start small and grow the network

• Start with a sufficiently large network and prune away the

unnecessary connections

• For a given structure, determine the parameters (weights) that

minimize the error on the training samples (e.g., the mean

squared error)

• For now, we focus on the latter

Learning nonlinear real-valued functions



• Challenge – we know the desired outputs for nodes in the output layer, but not the hidden layer

• Need to solve the credit assignment problem – dividing the credit or blame for the performance of the output nodes among hidden nodes

• Generalized delta rule offers an elegant solution to the credit assignment problem in feed-forward neural networks in which each neuron computes a differentiable function of its inputs

• Solution can be generalized to other kinds of networks, including networks with cycles

Generalized delta rule – error back-propagation



• Forward operation (computing output for a given input based on

the current weights)

• Learning – modification of the network parameters (weights) to

minimize an appropriate error measure

• Because each neuron computes a differentiable function of its

inputs

– If error is a differentiable function of the network

outputs, the error is a differentiable function of the

weights in the network – so we can perform gradient

descent!

Feed-forward networks



A fully connected 3-layer network



Generalized delta rule• Let tkp be the k-th target (or desired) output for input pattern Xp

and zkp be the output produced by k-th output node and let W represent all the weights in the network

• Training error:

• The weights are initialized with pseudo-random values and are changed in a direction that will reduce the error:

( )WW ååå =-== p

p

M

kkpkp

pS EztE

1

2

21 )()(

ji

Sji w

Eηw¶¶

-=Dkj

Skj w

Eηw¶¶

-=D



Generalized delta rule

h>0 is a suitable the learning rate WßW+ DWHidden–to-output weights

jpkj

kp ywn

=¶

¶

kj

kp

kp

p

kj

p

wn

nE

wE

¶

¶

¶

¶=

¶

¶.

))((. 1kpkpkp

kp

kp

p

kp

p ztnz

zE

nE

--=¶

¶

¶

¶=

¶

¶

jpkpkjjpkpkpkjkj

pkjkj ywyztw

wE

ηww d+=-+=¶

¶-¬ )(



Generalized delta rule

Weights from input to hidden units

( ) ( )( )

( )( ) ( )( )

( ) ( ) ( )

ipjp

ipjpjpkj

M

kkp

ipjpjpkj

M

kkpkp

ipjpjpkjlp

M

llp

kp

M

k

x

xyyw

xyywzt

xyywztz

jp

d

d

d

-=

÷÷ø

öççè

æ--=

---=

-úúû

ù

êêë

é-

¶¶

=

å

å

åå

=

=

==

!!!!! "!!!!! #$

1)(

1)(

1)()(21

1

1

2

11

ji

jp

jp

jp

jp

kp

kp

pM

kji

kpM

k kp

p

ji

p

wn

ny

yz

zE

wz

zE

wE

¶

¶

¶

¶

¶

¶

¶

¶=

¶

¶

¶

¶=

¶

¶åå==

..11

ipjpjiji xηww d+¬



Start with small random initial weightsUntil desired stopping criterion is satisfied do

Select a training sample from SCompute the outputs of all nodes based on current weights and the input sampleCompute the weight updates for output nodesCompute the weight updates for hidden nodesUpdate the weights

Back propagation algorithm



kpk

p zF maxarg)( =X

Classify a pattern by assigning it to the class that corresponds to the index of the output node with the largest output for the pattern

Network outputs are real valued.

How can we use the networks for classification?

Using neural networks for classification



Training multi-layer networks – Some Useful Tricks

• Initializing weights to small random values that place the neurons in the linear portion of their operating range for most of the patterns in the training set improves speed of convergence e.g.,

å=

±=,...,Ni

||xNji iw

1

121

å= ÷

÷

ø

ö

çç

è

æ

÷÷÷÷

ø

ö

çççç

è

æ

±=å,...,Ni xw

Nkjiji

w1

121

j

For input to hidden layer weights with the sign of the weight chosen at random

For hidden to output layer weights with the sign of the weight chosen at random



Some Useful Tricks

• Use of momentum term allows the effective learning rate for each weight to adapt as needed and helps speed up convergence – in a network with 2 layers of weights,

( ) ( )

( )

( ) ( )

( )

0.9 to . 0.6, to .of values typical with1,0 where

)()(

)(

)()(

)(

8050

1

1

1

1

=a=h<ha<

ïïïï

þ

ïïïï

ý

ü

-Da+¶¶

-=D

D+=+

-Da+¶¶

-=D

D+=+

=

=

twwE

ηtw

twtwtw

twwEηtw

twtwtw

kj

twwji

Skj

kjkjkj

ji

twwji

Sji

jijiji

kjkj

jiji



Some Useful Tricks

• Use sigmoid function which satisfies j(–z)=–j(z) helps speed up convergence

( )

( ) 11 range in thelinear is and

132b ,716.1

11

0

<<-

»¶¶

Þ==

÷÷ø

öççè

æ+-

=

=

-

-

zzz

a

eeaz

z

bz

bz

j

j

j



Some Useful Tricks

• Randomize the order of presentation of training examples

from one pass to the next helps avoid local minima

• Introduce small amounts of noise in the weight updates (or

into examples) during training helps improve generalization –

minimizes over fitting, makes the learned approximation more

robust to noise, and helps avoid local minima

• If using the suggested sigmoid nodes in the output layer, set

target output for output nodes to be 1 for target class and -1

for all others



• Regularization helps avoid over fitting and improves generalization

( ) ( ) ( ) ( ) 10 ; £l£l-+l= WWW CER 1

( )

and kjkj

jiji

kjkj

jiji

wwCw

wC

wwC

-=¶¶

--=¶¶

-

÷÷ø

öççè

æ+= åå 22

21W

Start with l close to 1 and gradually lower it during training. When l <1, it tends to drive weights toward zero setting up a tension between error reduction and complexity minimization



Some Useful Tricks

Input and output encodings• Do not eliminate natural proximity in the input or output space

• Do not normalize input patterns to be of unit length if the length is likely to be relevant for distinguishing between classes

• Do not introduce unwarranted proximity as an artifact• Do not use log2 M outputs to encode M classes, use M

outputs instead to avoid spurious proximity in the output space

• Use error correcting codes when feasible



Some Useful Tricks

Examples of a good code

• Binary thermometer codes for encoding real values

• Suppose we can use 10 bits to represent a value between -

1.0 and +1.0

• We can quantize the interval [-1, 1] into 10 equal parts

• 0.38 in thermometer code is 1111000000

• 0.60 in thermometer code is 1111110000

• Note values that are close along the real number line have

thermometer codes that are close in Hamming distance

Example of a bad code

• Ordinary binary representations of integers



Some Useful Tricks

• Normalizing inputs – know when and when not to normalize• Scale each component of the input separately to lie between -

1 and 1 with mean of 0 and standard deviation of 1

( )i

iipip

i

P

qiqi

P

qiqi

xx

xP

xP

s

µ-¬

µ-=s

=µ

å

å

=

=

2

1

22

1

1

1



Some Useful Tricks

Initializing weights (revisited)

Nw

N

Nw-

NwN-w

ww

ji11

11and 1between fall to want thisWe

and between ddistribute isneuron hidden a input to edStandardiz

and between ddistributeuniformly are weightsSuppose

<<-Þ

÷ø

öçè

æ =Þ+

+

+-

Hkj

H nw

n11

<<-



Some Useful Tricks

• Normalizing inputs – know when and when not to normalize• Normalizing each input pattern so that it is of unit length is

commonplace, but often inappropriate

p

pp X

XX ¬ w1

w21



Some Useful Tricks

• Use of problem specific information (if known) speeds up convergence and improves generalization

• In networks designed for translation-invariant visual image classification, building in translation invariance as a constraint on the weights helps

• If we know the function to be approximated is smooth, we can build that in as part of the criterion to be minimized –minimize in addition to the error, the gradient of the error with respect to the inputs



Some Useful Tricks

• Manufacture training data – training networks with translated and rotated patterns if translation and rotation invariant recognition is desired

• Incorporate hints during training

• Hints are used as additional outputs during training to help shape the hidden layer representation

Hint nodes (e.g., vowels versus consonants in training a phoneme recognizer)



Some Useful Tricks

• Reducing the effective number of free parameters (degrees of freedom) helps improve generalization

• Regularization

• Preprocess the data to reduce the dimensionality of the input –

• Train a neural network with output same as input, but with fewer hidden neurons than the number of inputs

• Use the hidden layer outputs as inputs to a second network to do function approximation



Some Useful Tricks

• Choice of appropriate error function is critical – do not blindly minimize sum squared error – there are many cases where other criteria are appropriate

• Example

÷÷ø

öççè

æ=åå

= = kp

kpP

p

M

kkpS z

ttE ln)(

1 1W

is appropriate for minimizing the distance between the target probability distribution over the M output variables and the probability distribution represented by the network



Some Useful Tricks

• Interpreting the outputs as class conditional probabilities

• Use exponential output nodes

÷÷÷÷

ø

ö

çççç

è

æ

=

÷÷÷÷

ø

ö

çççç

è

æ

=

=

å

å

å

=

=

=

M

lkp

n

kp

M

llp

kpkp

n

jjpkjkp

n

ez

n

nz

ywn

kp

H

1

1

0

output lexponentia

output linear



( ) ( ) ( )( ) ( )

( )

( ) ( )( )

( )( ) ( )( )22

2

1

1

0;1;

;

input for output th ;

outputth target if 0)( if 1)(

|

||

åå

å

å

ÏÎ

=

=

-+-=

-=

=

ïþ

ïýü

Ï==

Î==

=

kpkp

pkpk

P

pkppkS

ppk

kpkppk

kpkppk

ll

M

l

kkk

gg

tgE

kg

ktttt

PP

PPP

ww

w

w

ww

www

XXWXWX

WXW

XWXXXXX

X

XX



Bayes classification and Neural Networks

( ) ( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( ) ( ) ( )!!!!! "!!!!! #$

W

XXXXXXXWX

XXXXWXXXWX

XXWXXXWXW

oft independen

2

2

22

||

||)(|;

,,;2;

|;)(|1;)(||

1lim

dPPPdPPg

dPdPgdPg

dPgPdPgPES

kikkk

kkkk

kikkikkkSS

òòòòò

òò

¹

¹¹¥®

+-=

+-=

+-=

www

ww

wwww

( ) ( )XWX |; kk Pg w»Because generalized delta rule minimizes this quantity with respect to W, we have

Assuming that the network is expressive enough to represent ( )X|kP w



Radial-Basis Function Networks

• A function is approximated as a linear combination of radial basis functions (RBF). RBFs capture local behaviors of functions.

• RBFs represent local receptive fields



Radial Basis Function Networks

• Hidden layer applies a non-linear transformation from the input space to the hidden space.

• Output layer applies a linear transformation from the hidden space to the output space.

x2

xm

x1

y

wH

w11j

Hj



Example of a radial basis function

• Hidden units: use a radial basis function

x2

x1

xN

φs( || X-W||2)

W is called centers is called spreadcenter and spread are parameters

sj

φs( || X- W||2)the output depends on the distance of the input x from the center t



Radial basis function

• A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the spread s.

• Larger spread Þ less sensitivity

• Neurons in the visual cortex have locally tuned frequency responses.



Gaussian Radial Basis Function φ

center

φ :

s is a measure of how spread the curve is:

Large s Small s



Types of φ

• Multiquadrics

• Inverse multiquadrics

• Gaussian functions:

21)()( 22 crr +=j

÷÷ø

öççè

æ-= 2

2

2exp)(

sj rr

21)(

1)( 22 crr

+=j

|||| WX -=>rc 0

0>c

0>s



Implementing Exclusive OR using an RBF network

• Input space:

• Output space:

• Construct an RBF pattern classifier such that:(0,0) and (1,1) are mapped to 0, class C1(1,0) and (0,1) are mapped to 1, class C2

(1,1)(0,1)

(0,0)(1,0)

x1

x2

y

10



j

Sσj σE

ησj ¶¶

-=D

j

Sjj

Eη

a¶¶

-=aD

Depending on the specific function can be computed using the chain rule of calculus

ji

Sjiji wE

ηw¶¶

-=D



( )[ ][ ]TjNjj

TNppp

ppp

L

jjpjp

jp

ww

xx

ytE

zy

ez j

jp

......

.........

1

1

2

0

2

21

2

2

=

=

-=

a=

=

å=

s-

--

W

X

WX



RBF Learning Algorithm ( )

( )

( ) ( )

( ) ( )jiipj

jpjppjijiji

jiipj

jpjpp

ji

jp

jp

p

p

p

ji

p

jpppjjj

jpppjj

pjj

wxz

ytww

wxz

yt

wz

zy

yE

wE

zyt

zytE

α

-÷÷ø

öççè

æ-+=

-÷÷ø

öççè

æ--=

¶

¶

¶

¶

¶

¶=

¶

¶

-+¬

-=¶

¶-=D

2

2

sah

sa

haa

ha

h



RBF Learning Algorithm

( ) ( ) ( )

( ) ( ) ( )÷÷ø

öççè

æ÷÷ø

öççè

æ

sa-h-s¬s

÷÷ø

öççè

æ÷÷ø

öççè

æ

s-a--=

s¶

¶

¶

¶

¶

¶=

s¶

¶

jpj

jpjppjjj

jpj

jpjpp

j

jp

jp

p

p

p

j

p

zzyt

zzyt

zzy

yEE

ln

ln

2

2



RBF Learning Algorithm (continued)

( ) ( )

( )

( )

( ) XXXX

XXXX

XX

TT

T

TC

TTTC

T

AdAd

AAdd

AAdd

CCVV

CVCVCVCVV

VVV

=

=

=

==

==

=

matrix) symmetric a isA (when 2

matrixidentity if

norm) (weighted

(norm)

22

2

2

Some useful facts



( ) ( )

( )[ ][ ]TjNjj

TNppp

ppp

L

jjpjp

jp

ww

xx

ytE

zy

ez jpjT

jp

......

.........

1

1

2

0

21

21

=

=

-=

a=

=

å=

-S--

W

X

WXWX

principlesfirst from equations update weight theDerive :Exercise



RBF Learning Algorithm( )

( )

( ) ( )

( ) ( )

( ) ( ) ( )÷÷ø

öççè

æ÷÷ø

öççè

æ

s-a--=

s¶

¶

¶

¶

¶

¶=

s¶

¶

-÷÷ø

öççè

æ

sa-h+=

-÷÷ø

öççè

æ

sa--=

¶

¶

¶

¶

¶

¶=

¶

¶

-h+a¬a

-h=a¶

¶h-=D

jpj

jpjpp

j

jp

jp

p

p

p

j

p

jiipj

jpjppjijiji

jiipj

jpjpp

ji

jp

jp

p

p

p

ji

p

jpppjjj

jpppjj

pjj

zzyt

zzy

yEE

wxz

ytww

wxz

yt

wz

zy

yE

wE

zyt

zytE

α

ln2

2

2




÷÷÷

ø

ö

ççç

è

æ

÷÷÷

ø

ö

ççç

è

æ----

=pjjC

Tpjejpz

XWXW 1

More general form of radial basis function

is the inverse of an N�N covariance matrix

Note that the covariance matrix is symmetric

1-jC

Exercise: derive a learning rule for an RBF network with such neurons in the hidden layer and linear neurons in the output layer




( ) ( )

( )

( )

( ) XXXX

XXXX

XX

TT

T

TC

TTTC

T

AdAd

AAdd

AAdd

CCVV

CVCVCVCVV

VVV

=

=

=

==

==

=

matrix)symmetric a is A (when

matrixidentity if

norm) (weighted

(norm)

2

22

2

2

Some useful facts

center for big data analytics and discovery informatics ... · center for big data analytics and...

Documents