tutorial on neural network models for speech and image processing

100
1 Tutorial on Neural Network Models for Speech and Image Processing B. Yegnanarayana Speech & Vision Laboratory Dept. of Computer Science & Engineering IIT Madras, Chennai-600036 [email protected] WCCI 2002, Honululu, Hawaii, USA May 12, 2002

Upload: jersey

Post on 31-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Tutorial on Neural Network Models for Speech and Image Processing. B. Yegnanarayana Speech & Vision Laboratory Dept. of Computer Science & Engineering IIT Madras, Chennai-600036 [email protected]. WCCI 2002, Honululu, Hawaii, USA May 12, 2002. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tutorial on Neural Network Models for Speech and Image Processing

1

Tutorial on

Neural Network Models for Speech and Image Processing

B. YegnanarayanaSpeech & Vision Laboratory

Dept. of Computer Science & EngineeringIIT Madras, Chennai-600036

[email protected]

WCCI 2002, Honululu, Hawaii, USAMay 12, 2002

Page 2: Tutorial on Neural Network Models for Speech and Image Processing

2

Need for New Models of Computing for Speech & Image

Tasks

• Speech & Image processing tasks

• Issues in dealing with these tasks by human beings

• Issues in dealing with the tasks by machine

• Need for new models of computing in dealing with natural signals

• Need for effective (relevant) computing

• Role of Artificial Neural Networks (ANN)

Next >< Prev

Page 3: Tutorial on Neural Network Models for Speech and Image Processing

3

Organization of the Tutorial

Part I Feature extraction and classification problems with speech and image data

Part II Basics of ANN

Part III ANN models for feature extraction and classification

Part IV Applications in speech and image processing

Next >< Prev

Page 4: Tutorial on Neural Network Models for Speech and Image Processing

4

PART I

Feature Extraction and Classification Problems in

Speech and Image

Page 5: Tutorial on Neural Network Models for Speech and Image Processing

5

Feature Extraction and Classification Problems in Speech and Image

• Distinction between natural and synthetic signals (unknown model vs known model generating the signal)

• Nature of speech and image data (non-repetitive data, but repetitive features)

• Need for feature extraction and classification

• Methods for feature extraction and models for classification

• Need for nonlinear approaches (methods and models)

Next >< Prev

Page 6: Tutorial on Neural Network Models for Speech and Image Processing

6

Speech vs Audio

• Audio (audible) signals (noise, music, speech and other signals)

• Categories of audio signals

– Audio signal vs non-signal (noise)

– Signal from speech production mechanism vs other audio signals

– Non-speech vs speech signals (like with natural language)

Next >< Prev

Page 7: Tutorial on Neural Network Models for Speech and Image Processing

7

Speech Production Mechanism

< Back

Page 8: Tutorial on Neural Network Models for Speech and Image Processing

8

Different types of sounds

< Back

Page 9: Tutorial on Neural Network Models for Speech and Image Processing

9

Categorization of sound units

< Back

Page 10: Tutorial on Neural Network Models for Speech and Image Processing

10

Nature of Speech Signal

• Digital speech: Sequence of samples or numbers

• Waveform for word “MASK” (Figure)

• Characteristics of speech signal– Excitation source characteristics

– Vocal tract system characteristics

Next>< Prev

Page 11: Tutorial on Neural Network Models for Speech and Image Processing

11

Waveform for the word “mask”

<Back

Page 12: Tutorial on Neural Network Models for Speech and Image Processing

12

Source-System Model of Speech Production

Impulsetrain generator

Randomnoise generator

Time-varyingdigital filterX

Voice/unvoiced

switch

G

u(n)s(n)

Pitch period

Vocal tract parameters

Next >< Prev

Page 13: Tutorial on Neural Network Models for Speech and Image Processing

13

Features from Speech Signal (demo)

• Different components of speech (speech, source and system)

• Different speech sound units (Alphabet in Indian Languages)

• Different emotions

• Different speakers

Next >< Prev

Page 14: Tutorial on Neural Network Models for Speech and Image Processing

14

Speech Signal Processing Methods

• To extract source-system features and suprasegmental features

• Production-based features

• DSP-based features

• Perception-based features

Next >< Prev

Page 15: Tutorial on Neural Network Models for Speech and Image Processing

15

Models for Matching and Classification

• Dynamic Time Warping (DTW)

• Hidden Markov Models (HMM)

• Gaussian Mixture Models (GMM)

Next >< Prev

Page 16: Tutorial on Neural Network Models for Speech and Image Processing

16

Applications of Speech Processing

• Speech recognition

• Speaker recognition/verification

• Speech enhancement

• Speech compression

• Audio indexing and retrieval

Next >< Prev

Page 17: Tutorial on Neural Network Models for Speech and Image Processing

17

Limitations of Feature Extraction Methods and Classification Models

• Fixed frame analysis

• Variability in the implicit pattern

• Not pattern-based analysis

• Temporal nature of the patterns

Next >< Prev

Page 18: Tutorial on Neural Network Models for Speech and Image Processing

18

Need for New Approaches

• To deal with ambiguity and variability in the data for feature extraction

• To combine evidence from multiple sources (classifiers and knowledge sources)

Next >< Prev

Page 19: Tutorial on Neural Network Models for Speech and Image Processing

19

Images• Digital Image - Matrix of numbers

• Types of Images

– line sketches, binary, gray level and color

– Still images, video, multimedia

Next >< Prev

Page 20: Tutorial on Neural Network Models for Speech and Image Processing

20

Image Analysis

• Feature extraction• Image segmentation: Gray level, color,

texture• Image classification

Next >< Prev

Page 21: Tutorial on Neural Network Models for Speech and Image Processing

21

Processing of Texture-like Images2-D Gabor Filter

A typical Gaussian filter with =30

A typical Gabor filter with

=30, =3.14 and =45

)sincos(2

1exp

2

1),,,,,( ))()(( 22

yxj

yxyxf

yxyxyx

Next >< Prev

Page 22: Tutorial on Neural Network Models for Speech and Image Processing

22

Limitations

• Feature extraction• Matching• Classification methods/models

Next >< Prev

Page 23: Tutorial on Neural Network Models for Speech and Image Processing

23

Need for New Approaches

• Feature extraction: PCA and nonlinear PCA

• Matching: Stereo images

• Smoothing: Using the knowledge of image and not noise

• Edge extraction and classification: Integration of global and local information or combining evidence

< Prev Next >

Page 24: Tutorial on Neural Network Models for Speech and Image Processing

24

PART II

Basics of ANN

Page 25: Tutorial on Neural Network Models for Speech and Image Processing

25

• Problem solving: Pattern recognition tasks by human and machine

• Pattern vs data

• Pattern processing vs data processing

• Architectural mismatch

• Need for new models of computing

Artificial Neural Networks

Next >< Prev

Page 26: Tutorial on Neural Network Models for Speech and Image Processing

26

Biological Neural Networks

• Structure and function: Neurons, interconnections, dynamics for learning and recall

• Features: Robustness, fault tolerance, flexibility, ability to deal with variety of data situations, collective computation

• Comparison with computers: Speed, processing, size and complexity, fault tolerance, control mechanism

• Parallel and Distributed Processing (PDP) models

Next >< Prev

Page 27: Tutorial on Neural Network Models for Speech and Image Processing

27

Basics of ANN

• ANN terminology: Processing unit (fig), interconnection, operation and update (input, weights, activation value, output function, output value)

• Models of neurons: MP neuron, perceptron and adaline

• Topology (fig)• Basic learning laws (fig)

Next >< Prev

Page 28: Tutorial on Neural Network Models for Speech and Image Processing

28

Model of a Neuron

<back

Page 29: Tutorial on Neural Network Models for Speech and Image Processing

29

Topology

<back

Page 30: Tutorial on Neural Network Models for Speech and Image Processing

30

Basic Learning Laws

<back

Page 31: Tutorial on Neural Network Models for Speech and Image Processing

31

Activation and Synaptic Dynamic Models

• General activation dynamics model

ij

ijjjiiiiiiiiiiiii wxfJxDExfIxCBxAtx ))()(())()(()(

Passive decay term

Excitatory term Inhibitory term

• Synaptic dynamics model

)()()()( tststwtw jiijij Passive

decay term

Correlation term

• Stability and convergence<Prev

Next>

Page 32: Tutorial on Neural Network Models for Speech and Image Processing

32

Functional Units and Pattern Recognition Tasks

• Feedforward ANN– Pattern association– Pattern classification– Pattern mapping/classification

• Feedback ANN– Autoassociation– Pattern storage (LTM)– Pattern environment storage (LTM)

• Feedforward and Feedback (Competitive Learning) ANN– Pattern storage (STM)– Pattern clustering– Feature map

Next >< Prev

Page 33: Tutorial on Neural Network Models for Speech and Image Processing

33

Two Layer Feedforward Neural Network (FFNN)

Next >< Prev

Page 34: Tutorial on Neural Network Models for Speech and Image Processing

34

PR Tasks by FFNN• Pattern association

– Architecture: Two layers, linear processing, single set of weights– Learning:, Hebb's (orthogonal) rule, Delta (linearly independent) rule– Recall: Direct– Limitation: Linear independence, number of patterns restricted to input dimensionality– To overcome: Nonlinear processing units, leads to a pattern classification problem

• Pattern classification– Architecture: Two layers, nonlinear processing units, geometrical interpretation– Learning: Perceptron learning– Recall: Direct– Limitation: Linearly separable functions, cannot handle hard problems– To overcome: More layers, leads to a hard learning problem

• Pattern mapping/classification– Architecture: Multilayer (hidden), nonlinear processing units, geometrical interpretation– Learning: Generalized delta rule (backpropagation)– Recall: Direct– Limitation: Slow learning, does not guarantee convergence– To overcome: More complex architecture

Next >< Prev

Page 35: Tutorial on Neural Network Models for Speech and Image Processing

35

Perceptron Network

• Perceptron classification problem• Perceptron learning law• Perceptron convergence theorem• Perceptron representation problem• Multilayer perceptron

Next >< Prev

Page 36: Tutorial on Neural Network Models for Speech and Image Processing

36

Geometric Interpretation of Perceptron Learning

Next >< Prev

Page 37: Tutorial on Neural Network Models for Speech and Image Processing

37

Generalized Delta Rule (Backpropagation Learning)

ok

okk

ok

hj

okkj ffbsssw )(,

K

k

okkj

hj

hji

hj

hji swfsasw

1

,

Next >< Prev

Page 38: Tutorial on Neural Network Models for Speech and Image Processing

38

Issues in Backpropagation Learning

• Description and features of error backpropagation

• Performance of backpropagation learning• Refinements of backpropagation learning• Interpretation of results of learning• Generalization• Tasks with backpropagation network• Limitations of backpropagation learning• Extensions to backpropagation

Next >< Prev

Page 39: Tutorial on Neural Network Models for Speech and Image Processing

39

PR Tasks by FBNN• Autoassociation

– Architecture: Single layer with feedback, linear processing units– Learning: Hebb (orthogonal inputs), Delta (linearly independent inputs)– Recall: Activation dynamics until stable states are reached– Limitation: No accretive behavior– To overcome: Nonlinear processing units, leads to a pattern storage problem

• Pattern Storage– Architecture: Feedback neural network, nonlinear processing units, states, Hopfield

energy analysis– Learning: Not important– Recall: Activation dynamics until stable states are reached– Limitation: Hard problems, limited number of patterns, false minima– To overcome: Stochastic update, hidden units

• Pattern Environment Storage– Architecture: Boltzmann machine, nonlinear processing units, hidden units,

stochastic update– Learning: Boltzmann learning law, simulated annealing– Recall: Activation dynamics, simulated annealing– Limitation: Slow learning– To Overcome: Different architecture

Next >< Prev

Page 40: Tutorial on Neural Network Models for Speech and Image Processing

40

Hopfield Model• Model• Pattern storage condition

• Capacity of Hopfield model: Number of patterns for a given probability of error

kikjj

ij aaw )sgn(

lj

L

lliij aa

Nw

Lk

Ni

1

1

,...1

,...1where

• Energy analysis:

02

1

V

ssswV iijiij

Continuous Hopfield model: x

x

e

exf

1

1)(Next >< Prev

Page 41: Tutorial on Neural Network Models for Speech and Image Processing

41

State Transition Diagram

Next >< Prev

Page 42: Tutorial on Neural Network Models for Speech and Image Processing

42

Computation of Weights for Pattern Storage

Patterns to be stored (111) and (010).

Results in set of inequalities to be satisfied.

Next >< Prev

Page 43: Tutorial on Neural Network Models for Speech and Image Processing

43

Pattern Storage Tasks• Hard problems : Conflicting requirements on a

set of inequalities• Hidden units: Problem of false minima• Stochastic update

Stochastic equilibrium: Boltzmann-Gibbs Law Z

esP

T

E

)(Next >< Prev

Page 44: Tutorial on Neural Network Models for Speech and Image Processing

44

Simulated Annealing

Next >< Prev

Page 45: Tutorial on Neural Network Models for Speech and Image Processing

45

Boltzmann Machine

• Pattern environment storage• Architecture: Visible units, hidden

units, stochastic update, simulated annealing

• Boltzmann Learning Law:

)(

ijijpp

Twij

Next >< Prev

Page 46: Tutorial on Neural Network Models for Speech and Image Processing

46

Discussion on Boltzmann Learning• Expression for Boltzmann learning

– Significance of p+ij and p-

ij

– Learning and unlearning– Local property– Choice of and initial weights

• Implementation of Boltzmann learning– Algorithm for learning a pattern environment– Algorithm for recall of a pattern– Implementation of simulated annealing– Annealing schedule

• Pattern recognition tasks by Boltzmann machine– Pattern completion– Pattern association– Recall from noisy or partial input

• Interpretation of Boltzmann learning– Markov property of simulated annealing– Clamped-free energy and full energy

• Variations of Boltzmann learning– Deterministic Boltzmann machine– Mean-field approximation

Next >< Prev

Page 47: Tutorial on Neural Network Models for Speech and Image Processing

47

Competitive Learning Neural Network (CLNN)

Output layer with on-center and off-surroundconnections

Input layer

Next >< Prev

Page 48: Tutorial on Neural Network Models for Speech and Image Processing

48

PR Tasks by CLNN• Pattern storage (STM)

– Architecture: Two layers (input and competitive), linear processing units– Learning: No learning in FF stage, fixed weights in FB layer– Recall: Not relevant– Limitation: STM, no application, theoretical interest– To overcome: Nonlinear output function in FB stage, learning in FF stage

• Pattern clustering (grouping)– Architecture: Two layers (input and competitive), nonlinear processing units in

the competitive layer– Learning: Only in FF stage, Competitive learning– Recall: Direct in FF stage, activation dynamics until stable state is reached in

FB layer– Limitation: Fixed (rigid) grouping of patterns– To overcome: Train neighbourhood units in competition layer

• Feature map– Architecture: Self-organization network, two layers, nonlinear processing units,

excitatory neighbourhood units– Learning: Weights leading to the neighbourhood units in the competitive layer– Recall: Apply input, determine winner– Limitation: Only visual features, not quantitative– To overcome: More complex architecture

Next >< Prev

Page 49: Tutorial on Neural Network Models for Speech and Image Processing

49

Learning Algorithms for PCA networks

Next >< Prev

Page 50: Tutorial on Neural Network Models for Speech and Image Processing

50

Self Organization Network

(a) Network structure (b) Neighborhood regions at different times in the output layer

Input layer

Output layer

Next >< Prev

Page 51: Tutorial on Neural Network Models for Speech and Image Processing

51

Illustration of SOM

< Prev Next >

Page 52: Tutorial on Neural Network Models for Speech and Image Processing

52

PART III

ANN Models for Feature Extraction and Classification

Next >

Page 53: Tutorial on Neural Network Models for Speech and Image Processing

53

Neural Network Architecture and Models for Feature Extraction

• Multilayer Feedforward Neural Network (MLFFNN)

• Autoassociative Neural Networks (AANN)

• Constraint Satisfaction Models (CSM)• Self Organization MAP (SOM)• Time Delay Neural Networks (TDNN)• Hidden Markov Models (HMM)

Next >< Prev

Page 54: Tutorial on Neural Network Models for Speech and Image Processing

54

Multilayer FFNN

• Nonlinear feature extraction followed by linearly separable classification problem

Next >< Prev

Page 55: Tutorial on Neural Network Models for Speech and Image Processing

55

• Complex decision hypersurfaces for classification

• Asymptotic approximation of a posterior class probabilities

Multilayer FFNN

Next >< Prev

Page 56: Tutorial on Neural Network Models for Speech and Image Processing

56

• Radial Basis Function NN: Clustering followed by classification

Input vector a

Basis function

j

j(a)

Class labels

c1

cN

Radial Basis Function

Next >< Prev

Page 57: Tutorial on Neural Network Models for Speech and Image Processing

57

• Architecture• Nonlinear PCA• Feature extraction• Distribution capturing ability

Autoassociation Neural Network (AANN)

Next >< Prev

Page 58: Tutorial on Neural Network Models for Speech and Image Processing

58

Autoassociation Neural Network (AANN)

• Architecture

Input Layer Output LayerDimension Compression Hidden Layer

<Back

Page 59: Tutorial on Neural Network Models for Speech and Image Processing

59

Distribution Capturing Ability of AANN

• Distribution of feature vector (fig) • Illustration of distribution in 2D

case (fig) • Comparison with Gaussian Mixture

Model (fig)

Next >< Prev

Page 60: Tutorial on Neural Network Models for Speech and Image Processing

60

Distribution of feature vector

<Back

Page 61: Tutorial on Neural Network Models for Speech and Image Processing

61

(a) Illustration of distribution in 2D case(b,c) Comparison with Gaussian Mixture Model

<Back

Page 62: Tutorial on Neural Network Models for Speech and Image Processing

62

Feature Extraction by AANN

• Input and output to AANN: Sequence of signal samples (captures dominant 2nd order statistical features)

• Input and output to AANN: Sequence of Residual samples (captures higher order statistical features in the sample sequence)

Next >< Prev

Page 63: Tutorial on Neural Network Models for Speech and Image Processing

63

Constraint Satisfaction Model

• Purpose: To satisfy the given (weak) constraints as much as possible

• Structure: Feedback network with units (hypotheses), connections (constraints / knowledge)

• Goodness of fit function: Depends on the output of unit and connection weights

• Relaxation Strategies: Deterministic and Stochastic

Next >< Prev

Page 64: Tutorial on Neural Network Models for Speech and Image Processing

64

Application of CS Models

• Combining evidence• Combining classifiers outputs• Solving optimization problems

Next >< Prev

Page 65: Tutorial on Neural Network Models for Speech and Image Processing

65

Self Organization Map (illustrations)

• Organization of 2D input to 1D feature mapping

• Organization of 16 Dimensional LPC vector to obtain phoneme map

• Organization of large document files

Next >< Prev

Page 66: Tutorial on Neural Network Models for Speech and Image Processing

66

Time Delay Neural Networks for Temporal Pattern Recognition

Next >< Prev

Page 67: Tutorial on Neural Network Models for Speech and Image Processing

67

Stochastic Models for Temporal Pattern Recognition

• Maximum likelihood formulation: Determine the class w, given the observation symbol sequence y, using criterion

• Markov Models

• Hidden Markov Models

)/(max wyPw

Next >< Prev

Page 68: Tutorial on Neural Network Models for Speech and Image Processing

68

PART IV

Applications in Speech & Image Processing

Page 69: Tutorial on Neural Network Models for Speech and Image Processing

69

Applications in Speech and Image Processing

• Edge extraction in texture-like images

• Texture segmentation/classification by CS model

• Road detection from satellite images

• Speech recognition by CS model

• Speaker recognition by AANN model

Next >< Prev

Page 70: Tutorial on Neural Network Models for Speech and Image Processing

70

Problem of Edge Extraction in Texture-like Images

• Nature of texture-like images• Problem of edge extraction• Preprocessing (1-D) to derive partial evidence• Combining evidence using CS model

Next >< Prev

Page 71: Tutorial on Neural Network Models for Speech and Image Processing

71

• Texture Edges are the locations where there is an abrupt change in texture properties

Problem of Edge Extraction

Image with 4 natural texture regions

Edgemap showing micro edges

Edgemap showing macro edges

Next >< Prev

Page 72: Tutorial on Neural Network Models for Speech and Image Processing

72

1-D processing using Gabor Filter and Difference Operator

• 1-D Gabor smoothing filter : Magnitude and Phase

)2

exp(2

1),,(

2

2xj

xxf

1-D Gabor Filter: Gaussian modulated by a complex sinusoidal

Odd Component

Even Component

)cos()2

exp(2

1)(

2

2x

xxfc

)sin()2

exp(2

1)(

2

2x

xxfs

Next >< Prev

Page 73: Tutorial on Neural Network Models for Speech and Image Processing

73

• Differential operator for edge evidence – First derivative of 1-D Gaussian function

• Need for a set of Gabor filters

1-D processing using Gabor filter and Difference operator (contd.)

)2

exp(2

)(2

23 yy

yc

Next >< Prev

Page 74: Tutorial on Neural Network Models for Speech and Image Processing

74

Texture Edge Extraction using 1-D Gabor Magnitude and Phase

• Apply 1-D Gabor filter along each of the parallel lines of an image in one direction ( say, horizontal )

• Apply all Gabor filters of the filter bank in a similar way

• For each of the Gabor filtered output, partial edge information is extracted by applying the 1-D differential operator in the orthogonal direction ( say, vertical )

• The entire process is repeated in the orthogonal (vertical and horizontal) directions to obtain the partial edge evidence in the other direction

• The partial edge evidence is combined using a Constraint Satisfaction Neural Network Model

Next >< Prev

Page 75: Tutorial on Neural Network Models for Speech and Image Processing

75

Bank of 1-D Gabor Filters

Input Image

Filtered Image

Edge evidence

Combining the Edge evidence using Constraint Satisfaction Neural Network Mode

Edge map

Texture Edge Extraction using a set of 1-D Gabor Filters

Post-processing using 1-D Differential operator and Thresholding

Next >< Prev

Page 76: Tutorial on Neural Network Models for Speech and Image Processing

76

Structure of 3-D CSNN Model

I

J

K

3D lattice of size IxJxKConnections among the

nodes across the layers of for each pixel

+ve

-veConnections from a set of neighboring

nodes to each node in the same layer.

Combining Evidence using CSNN model

Next >< Prev

Page 77: Tutorial on Neural Network Models for Speech and Image Processing

77

Combining the Edge Evidence using Constraint Satisfaction Neural Network

(CSNN) Model

• Neural network model contains nodes arranged in a 3-D lattice structure

• Each node corresponds to a pixel in the post-processed Gabor filter output

• Post processed output of a single 1-D Gabor filter is an input to one 2-D layer of nodes

• Different layers of nodes, each corresponding to a particular filter output, are stacked one upon the other to form the 3-D structure

• Each node represents a hypothesis• Connection between two nodes represents a constraint

• Each node is connected to other nodes with inhibitory and excitatory connections

Next >< Prev

Page 78: Tutorial on Neural Network Models for Speech and Image Processing

78

31or 1 1 if , 8

1

21or 1 1 if , 16

1

11or 11 if , 8

1

,1,1,,,

jjii

jjii

jjii

W kjikji

)1(2

1 and 1,,,,,

K

W kjikji

Let, represents the weight of the connection from node (i,j,k) to node (i1,j1,k) within each layer k, and the weight represents the constraint between the nodes in two different layers (k and k1) in the same column. These are given as:

1,1,1,,, kjikjiW

1,1,1,,, kjikjiW

• The node is connected to other nodes in the same column with excitatory connections

Combining Evidence using CSNN model (contd.)

Next >< Prev

Page 79: Tutorial on Neural Network Models for Speech and Image Processing

79

• Using the notation as the output of the node (i,j,k), and the set as the state if the network

• The state of the neural network model is initialized using:

• In the deterministic relaxation method, the state of the network is updated iteratively by changing the output of each node at one time

• The state of each node is obtained using:

Ui,j,k (n) = Wi,j,k,i1,j1,k i1,j1,k + Wi,j,k,i,j,k1 i,j,k1 +Ii,j,k

Where Ui,j,k(n) is the net input to node(i,j,k) at nth iteration, and Ii,j,k is the external input given to the node (i,j,k)

• The state of the network is updated using:

where is the threshold

}1,0{,, kji

},,{ ,,, kjikji

otherwise ,0

pixel edge an of evidence has pixel theif ,1)0(,,, kji

otherwise

if Un i,j,k

kji ,0

,1)1(,,

Combining Evidence using CSNN model (contd.)

Next >< Prev

Page 80: Tutorial on Neural Network Models for Speech and Image Processing

80

Comparison of Edge Extraction using Gabor Magnitude and Gabor Phase

2-D Gabor Filter 1-D Gabor Magnitude

Texture Image 1-D Gabor Phase

2-D Gabor Filter1-D Gabor MagnitudeTexture Image

1-D Gabor Phase

Next >< Prev

Page 81: Tutorial on Neural Network Models for Speech and Image Processing

81

Texture Segmentation and Classification

• Image analysis (revisited)• Problem of texture segmentation and

classification • Preprocessing using 2D Gabor filter to derive

feature vector • Combining the partial evidence using CS model

Next >< Prev

Page 82: Tutorial on Neural Network Models for Speech and Image Processing

82

CS Model for Texture Classification

• Supervised and unsupervised problem• Modeling of image constraint• Formulation of a posterior probability CS

model• Hopfield neural network model and its energy

function • Deterministic and Stochastic relaxation

strategies

Next >< Prev

Page 83: Tutorial on Neural Network Models for Speech and Image Processing

83

CS Model for Texture Classification- Modeling of Image Constraints

• Feature formation process: Defined by the conditional probability of the feature vector gs of each pixels given the model

parameter of each class k.

2

2/||||

)2()|(

22

kM

g

sss

kksekLgGP

• Partition process: Defines the probability of the label of a pixel given the label of the pixels in its pth order neighborhood.

p

LL

psrrs Z

eNLLP

psNr

rs

)(

),|(

Next >< Prev

• Label competition process: Describes the conditional probability of assigning a new label to an already labeled pixel

c

lk

lss Z

elLkLP

l

)(

)|(

Page 84: Tutorial on Neural Network Models for Speech and Image Processing

84

CS Model for Texture Classification- Modeling of Image Constraints (contd.)

• Formulation of Posteriori Probability

lspsrsss lLNLgGkLE

lspsrsss

eZ

lLNrLgGkLP

),,,|(1

),,,|(

where

lNrrsk

m

k

ks

lspsrsss

lkLLg

lLNrLgGkLE

ps

)()())2ln((2

1

2

||||

),,,|(

22

2

and

lspsrs

ksss

total lLNrLgGkLEE ),,,|(,

)()( kLPgGPZZZ ssscp

• Total energy of the system

Next >< Prev

Page 85: Tutorial on Neural Network Models for Speech and Image Processing

85

Connections among the nodes across the layers

of for each pixel

+ve-ve

CS Model for Texture Classification

Connections from a set of neighboring

nodes to each node in the same layer.

E

state

(ij1)

I

J

K

k(ijk)

(ijK)

(ij1)

(ijk)

(ijK)

Next >< Prev

Page 86: Tutorial on Neural Network Models for Speech and Image Processing

86

Hopfield Neural Network and its Energy Function

ii

iiii i

iiHopfield OBOOWE 1

11,2

1

kjikji

kjikjikjikji kji

kjikjiHopfield OBOOWE ,,

,,,,1,1,1,,

,, 1,1,11,1,1,,,2

1

o1 oj oN

B1 Bj BN

IJ

K

Next >< Prev

Page 87: Tutorial on Neural Network Models for Speech and Image Processing

87

Natural Textures Initial Classification Final Classification

< Back

Results of Texture Classification - Natural Textures

Page 88: Tutorial on Neural Network Models for Speech and Image Processing

88

Band-2 IRS image containing 4 texture

classes Initial Classification Final Classification

< Back

Results of Texture Classification - Remotely Sensing Data

Page 89: Tutorial on Neural Network Models for Speech and Image Processing

89

SIR-C/X-SAR image of Lost City of Ubar

Classification using multispectral information

Classification using multispectral and textural

information

< Back

Results of Texture Classification - Multispectral Data

Page 90: Tutorial on Neural Network Models for Speech and Image Processing

90

Speech Recognition using CS Model

• Problem of recognition of SCV unit (Table)• Issues in classification of SCVs(Table)• Representation of isolated utterance of

SCV unit– 60ms before and 140 ms after vowel

onset point– 240 dimensional feature vector

consisting of weighted cepstral coefficients

• Block diagram of the recognition system for SCV unit (Fig)

• CS network for classification of SCV unit(Fig)

Next >< Prev

Page 91: Tutorial on Neural Network Models for Speech and Image Processing

91

Problem of Recognition of SCV Units

<Back

Page 92: Tutorial on Neural Network Models for Speech and Image Processing

92

Issues in Classification of SCVs

• Importance of SCVs – High frequency of occurrence: About 45%

• Main Issues in Classification of SCVs– Large number of SCV classes – Similarity among several SCVs classes

• Model of Classification of SCVs– Should have good discriminatory capablity( Artificial neural networks )- Should be able to handle large number of classes( Neural networks based on a modular approach )

<Back

Page 93: Tutorial on Neural Network Models for Speech and Image Processing

93

Block Diagram of Recognition

System for SCV Units

<Back

Page 94: Tutorial on Neural Network Models for Speech and Image Processing

94

CS Network for Classification of SCV Units

External evidence of bias for the node is computed using the

output of the MLFFNN5

External evidence of bias for the node is computed using the

output of the MLFFNN1

External evidence of bias for the node is

computed using the output of the MLFFNN9

Vowel Feedback Subnetwork

MOA Feedback Subnetwork

POA Feedback Subnetwork

<Back

Page 95: Tutorial on Neural Network Models for Speech and Image Processing

95

Classification Performance of CSM and other SCV Recognition Systems

on Test Data of 80 SCV Classes

Decision CriteriaSCV RecognitionSystem Case 1 Case 2 Case 3 Case 4HMM based system 45.5 59.2 65.9 71.480-class MLFFNN 45.3 59.7 66.9 72.2MOA modularnetwork

29.2 50.2 59.0 65.3

POA modularnetwork

35.1 56.9 69.5 76.6

Vowel modularnetwork

30.1 47.5 58.8 63.6

Combined evidencebased system

51.6 63.5 70.7 74.5

ConstraintSatisfaction model

65.6 75.0 80.2 82.6

Next >< Prev

Page 96: Tutorial on Neural Network Models for Speech and Image Processing

96

Speaker Verification using AANN Models and Vocal Tract System

Features• One AANN for each speaker • Verification by identification • AANN structure: 19L 38N 4N 38N 19 L• Feature: 19 weighted LPCC from 16th order

LPC for each frame of 27.5 ms and frame shift 13.75ms

• Training: Pattern mode, 100 epochs, 1 min of data

• Testing: Model giving highest confidence for 10 sec of test data

Next >< Prev

Page 97: Tutorial on Neural Network Models for Speech and Image Processing

97

Speaker Recognition using Source Features

• One model for each speaker • Structure of AANN: 40L 48N 12N 48N 40L• Feature: About 10 sec of data, 60 epochs• Testing: Select model giving highest

confidence for 2 sec of test data

Next >< Prev

Page 98: Tutorial on Neural Network Models for Speech and Image Processing

98

Other Applications

• Speech enhancement• Speech compression • Image compression• Character recognition• Stereo image matching

Next >< Prev

Page 99: Tutorial on Neural Network Models for Speech and Image Processing

99

Summary and Conclusions

• Speech and image processing: Natural tasks• Significance of pattern processing • Limitation of conventional computer architecture• Need for new models or architectures for pattern

processing tasks• Basics of ANN• Architecture of ANN for feature extraction and

classification • Potential of ANN for speech and image

processing

< Prev

Page 100: Tutorial on Neural Network Models for Speech and Image Processing

100

References

1. B.Yegnanarayana, “ Artificial Neural Networks”, Prentice-Hall of India, New Delhi, 1999

2. L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice-Hall, New Jersey, 1993

3. Alan C. Bovik, Handbook of Image and Video Processing, Academic Press, 2001

4. Xuedong Hwang, Alex Acero and Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice-Hall, New Jersey, 2001

5. P. P. Raghu, “Artificial Neural Network Models for Texture Analysis”, PhD Thesis, CSE Dept., IIT Madras, 1995

6. C. Chandra Sekar, “Neural Network Models for Recognition of Stop Consonant Vowel (SCV) Segments in Continuous Speech”, PhD Thesis, CSE Dept., IIT Madras, 1996

7. P. Kiran Kumar, “Texture Edge Extraction using One Dimensional Processing”, MS Thesis, CSE Dept., 2001

8. S. P. Kishore, “Speaker Verification using Autoassociative Neural Netwrok Models”, MS Thesis, CSE Dept., IIT Madras, 2000

9. B. Yegnanarayana, K. Sharath Reddy and S. P. Kishore, “Source and System Features for Speaker Recognition using AANN Models”, ICASSP, May 2001

10. S. P. Kishore, Suryakanth V. Ganagashetty and B. Yegnanarayana, “Online Text Independent Speaker Verification System using Autoassociative Neural Network Models”, INNS-IEEE Int. Conf. Neural Networks, July 2001.

11. K. Sharat Reddy, “Source and System Features for Speaker Recognition”, MS Thesis, CSE Dept., IIT Madras, September 2001.

12. B. Yegnanarayana and S. P. Kishore, “Autoassociative Neural Networks: An alternative to GMM for Pattern Recognition”, to appear in Nerual Networks 2002.