pattern recognition – basic concepts

22
Pattern recognition – basic concepts

Upload: livia

Post on 23-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Pattern recognition – basic concepts. Sample. input attribute, attribute, feature , input variable, independent variable ( atribut, rys, p říznak, vstupní proměnná, nezávisl e proměnná ) class, output variable, dependendent variable ( třída, výstupní proměnná, závislá proměnná ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pattern recognition – basic concepts

Pattern recognition – basic concepts

Page 2: Pattern recognition – basic concepts

Input OutputAttributes Attribute

Inst.Sepal

LengthSepal Width

Petal Length

Petal Width Species

1 5.1 3.5 1.4 0.2 setosa2 4.9 3 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5 3.6 1.4 0.2 setosa

Sample

• input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní proměnná, nezávisle proměnná)

• class, output variable, dependendent variable (třída, výstupní proměnná, závislá proměnná)

• sample (vzorek)

Page 3: Pattern recognition – basic concepts

Handwritten digits

Page 4: Pattern recognition – basic concepts

• each digit 28 x 28 pixels– so each digit can be represented by a vector x

comprising 784 real numbers• goal:– build a machine that will take x as input and will produce

the identity of the digit 0 … 9 as the output– non-trivial problem due to the wide variability of

handwriting– could be tackled by using rules for distinguishing the

digits based on the shapes of the strokes– in practice such an approach leads to a proliferation of

rules and of exceptions to the rules and so on, and invariably gives poor results

Page 5: Pattern recognition – basic concepts

• better way – adopt machine learning algorithm (i.e. use some adaptive model)

model

input

internal parameters influencing the behavior of the model must be adjusted

Page 6: Pattern recognition – basic concepts

• tune the parameters of the model using the training set– training set is a data set of N digits {x1, …, xN}

• the categories of the digits in the training set are known in advance (inspected manually), they form a target vector t for each digit (one target vector for each digit)

0 0 0 1 0 0 0 0 0 0

0 1 2 3 4 5 6 7 8 9

Page 7: Pattern recognition – basic concepts

The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors.

y(x)

x784x6x5x4x3x2x1 . . .

vector y

Page 8: Pattern recognition – basic concepts

• The precise form of the function y(x) is determined during the training phase (learning phase). – Model adapts its parameters (i.e. learns) on the basis

of the training data {x1, …, xN}.

• Trained model can then determine the identity of new, previously unseen, digit images which are said to comprise a test set.

• The ability to categorize correctly new examples that differ from those used for training is known as generalization.

Page 9: Pattern recognition – basic concepts

For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve.

y(x)

x784x6x5x4x3x2x1 . . .

vector y

Preprocessing

Page 10: Pattern recognition – basic concepts

• Translate and scale the images of the digits so that each digit is contained within a box of a fixed size.

• This greatly reduces the variability within each digit class, because the location and scale of all the digits are now the same.

• This pre-processing stage is sometimes also called feature extraction.

• Test data must be pre-processed using the same steps as the training data.

Page 11: Pattern recognition – basic concepts

Feature selection and feature extraction

x784x6x5x4x3x2x1 . . .

x456x103x5x1

x784x6x5x4x3x2x1 . . .

x*666x*

309x*152x*

18

x*784x*

6x*5x*

4x*3x*

2x*1 . . .

selection extraction

Page 12: Pattern recognition – basic concepts

Dimensionality reduction

• We want to reduce number of dimensions because:– Efficiency• measurement costs• storage costs• computation costs

– Problem may be solved more easily in the new space

– Improved classification performance– Ease of interpretation

Page 13: Pattern recognition – basic concepts

Curse of dimensionality

Bishop, Pattern Recognition and Machine Learning

Page 14: Pattern recognition – basic concepts

Supervised learning

• training data comprises examples of the input vectors along with their corresponding target vectors (e.g. digit recognition)– classification – aim: assign an input vector to one

of a finite number of discrete categories – regression (data/curve fitting) - desired output

consists of one or more continuous variables

Page 15: Pattern recognition – basic concepts

Unsupervised learning

• training data consists of a set of input vectors x without any corresponding target value

• goals:– discover groups of similar examples within the

data – clustering– project the data from a high-dimensional space

down to two or three dimensions for the purpose of visualization

Page 16: Pattern recognition – basic concepts

Polynomial curve fitting

• regression problem, supervised• we observe a real-valued input variable x and

we wish to use this observation to predict the value of a real-valued target variable t

• artificial example - sin(2πxn) + random noise• training set: N observations of x written as x =

(x1, … , xN)T + corresponding observations of the values of t: t = (t1, … , tN)T

Page 17: Pattern recognition – basic concepts

sin(2πxn) + random noise

N = 10x1 -> t1

x2 -> t2

etc.

training data set {x, t}

adapted from Bishop, Pattern Recognition and Machine Learning

Page 18: Pattern recognition – basic concepts

• goal: exploit the training set in order to make prediction of the target variable t’ for new value x’ of the input variable

• this is generally difficult, as– we have to generalize from the finite data set– data are corrupted by the noise, so for the given x’ there is uncertainty in

the value of t’

x’

t’

adapted from Bishop, Pattern Recognition and Machine Learning

Page 19: Pattern recognition – basic concepts

• decision: which method to use?• From the plethora of possibilities (you do not

know about yet) I chose a simple one – data will be fitted using this polynomial function

• polynomial coefficients w0, …, wM form a vector w– they represent parameters of the model that must

be set in the training phase– the polynomial model is still linear regression!!!

𝑦 (𝑥 ,𝑤 )=𝑤0+𝑤1𝑥+𝑤2𝑥❑2 +…+𝑤𝑀 𝑥

𝑀=∑𝑗=0

𝑀

𝑤 𝑗𝑥𝑗

Page 20: Pattern recognition – basic concepts

• The values of coefficients will be determined by minimizing the error function

• It measures the misfit between the function y(x, w) and the training set data points

• one simple choice: the sum of squared errors - SSE between the predictions y(xn, w) for each data point xn and the correspoding values tn

𝑆𝑆𝐸=12∑𝑛=1

𝑁

(𝑦 (𝑥𝑛 ,𝒘 )− 𝑡𝑛)2

Page 21: Pattern recognition – basic concepts

fitted function

displacement of thedata point tn from thefunction y(xn, w)

Bishop, Pattern Recognition and Machine Learning

𝑆𝑆𝐸=12∑𝑛=1

𝑁

(𝑦 (𝑥𝑛 ,𝒘 )− 𝑡𝑛)2

Page 22: Pattern recognition – basic concepts

• solving curve fitting problem means choosing the value of w for which E(w) is as small as possible … w* → y(x, w*)– as small as possible means to find a minimum of

E(w) (i.e. its derivatives)