josé antonio iglesias, agapito ledezma and araceli sanchis sequence classification using...

58
José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es .

Upload: claribel-reed

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition  Motivation and Introduction  Sequence classification  Our approach  L ibrary Creation  Classification  Target Environment  Description  Experiments and Results  Conclusions and Future Works 1 Outline

TRANSCRIPT

Page 1: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

Page 2: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approach

Library Creation Classification

Target Environment Description Experiments and Results

Conclusions and Future Works

OutlineOutline

.

1

Page 3: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and IntroductionMotivation and Introduction Sequence classification Our approach

Library Creation Classification

Target Environment Description Experiments and Results

Conclusions and Future Works

1

OutlineOutline

Page 4: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Opponent behavior Modelling / Classification(Environment: soccer simulation domain)

MotivationMotivation

.

2

Opponent Modeling

Pattern Recognition

Off-Line Analysis

No-Pattern LogFile

Pattern LogFile

Base Estrategy

Pattern

Recognized Patterns

On-Line Comparing Method

Pattern Detection

On-Line Detection

Environment Information Advices to

Players

RoboCup Soccer Server

Pattern Recognized

Page 5: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Behavior ClassificationBehavior Classification

Behavior as sequence of elements

Sequence ClassificationSequence Classification

IntroductionIntroduction

.

3

SequenceSequence::“set of elements ordered so that they can be labelled with the positive integers” (Merriam-Webster Dictionary)

Page 6: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classificationSequence classification Our approach

Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

4

OutlineOutline

Page 7: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

• Given:

Classes = {cClasses = {c11, c, c22, … c, … cnn}}

Sequence E = {eSequence E = {e11, e, e22, … e, … enn}}

• Determine: Which class ci Є C does the sequence E belong to.

Sequence classificationSequence classification

5

Page 8: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

.

6

OutlineOutline

Page 9: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Our approachOur approachpwdfsfg…

vimanls…

…fingermorels...

Sequence 1 Class 1

Sequence 2 Class 2

Sequence n Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vimorels…

Pattern to classify

Sequence to classify

Compare_Patterns

Compare_Patterns

Compare_Patterns

On-Line Sequence

Classification

SEQUENCE CLASS

Classification Result

.

7

Page 10: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library CreationLibrary Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

8

OutlineOutline

Page 11: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation

.

TrieTrie (re (retrietrieval)val) data structure data structure::

Special search tree used for storing elements and its prefixes.Special search tree used for storing elements and its prefixes.

Every node: Every node: – represents an element– stores useful information (times appeared,…)

9

Page 12: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sequence to insert initially in the trie: {pwd vi pwd vi pwd ls}

pwdvipwdvipwdls

Sequence

10

Page 13: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sequence to insert initially in the trie: {pwd vi pwd vi pwd ls}

Sub-sequence length: 3Sub-sequence length: 3 {pwd vi pwd vi pwd ls}

Sub-sequences to insert in the trie: {pwd vi pwd} and {vi pwd ls}

pwdvipwdvipwdls

Sequence

10

Page 14: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd pwd vi vi pwd pwd} and {vi pwd ls}

Root

11

Page 15: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd pwd vi vi pwd pwd} and {vi pwd ls}

Root

pwd [1]

vi [1]

pwd [1]

11

Page 16: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwd pwd} and {vi pwd ls}

Root

pwd [1]

vi [1]

pwd [1]

vi [1]

pwd [1]

11

Page 17: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [2]

vi [1]

pwd [1]

vi [1]

pwd [1]

11

Page 18: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [2]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

11

Page 19: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

11

Page 20: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

ls [1]

11

Page 21: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

{pwd vi vi pwd pwd vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

ls [1]

11

pwdvipwdvipwdls

Page 22: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Evaluate the relation/dependence between an element and its prefix

Two approaches:– Frequency-based method. Statistical dependence method.

Our approach: Statistical Value used: Chi-square value.This value is stored in every node of the trie

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

12

Page 23: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Event Different event Total

Prefix O11 O12 O11 + O12

Different Prefix

O21 O22 O21 + O22

Total O11+ O21 O12+ O22

O11 + O12+O21 + O22

O11: How many times the current node/element is followed by its prefix.O12: How many times the current node/element is followed by a different prefix.O21: How many times a different prefix (of the same length) is followed by the same node.O22: How many times a different prefix (of the same length) is followed by a different node.

Expected (Eij)= (Rowi Total x Columnj Total)

Grand Total

X2 = ∑ ∑(Oij - Eij ) 2

Eiji=1

r k

2 x 2 Contingency Table

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

j=1

.

13

Page 24: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

.

pwd [3]

vi [1] [5.1][5.1]

pwd [1] [4.3][4.3]

vi [2]

pwd [2] [3.5][3.5]

ls [1] [4.3][4.3]

ls [1] [4.3][4.3]

ls [2]

Sequence Pattern Trie

Root

A Sequence Pattern Trie is created for each class.

14

Page 25: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation ClassificationClassification

Target Environment Description Experiments & Results

Conclusions and Future Works

15

OutlineOutline

Page 26: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ClassificationClassificationpwdfsfg…

vimanls…

…fingermorels...

Sequence 1 Class 1

Sequence 2 Class 2

Sequence n Class n

Pattern 2 Pattern 3

Pattern Library

Classification

vimorels…

Sequence to classify

Compare_Patterns

Compare_Patterns

On-Line Sequence

Classification

ONLINE SEQUENCE

CLASS

.

Library Creation

Pattern to classify

TestingTesting TrieTrie

Pattern 1

Compare_PatternsClassClass TrieTrie

16

Page 27: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are in both Tries:

If ( abs(chi2TestingTrie – chi2

ClassTrie) ≤ ThresholdValue ):

SimilaritySimilarity between both tries.

Result [ElementTestingTrie, PrefixTestingTrie, ChiChi22TestingTrieTestingTrie]

Page 28: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are in both Tries: If (abs(5.1 – 7.1) ≤ ThresholdValue ):

SimilaritySimilarity between both tries.

Result [vi , pwd, 5.15.1]

Page 29: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [Element [ElementTestingTrieTestingTrie, Prefix, PrefixTestingTrieTestingTrie, , (Chi(Chi22TestingTrieTestingTrie * -1) * -1)]]

Page 30: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [who, pwd [who, pwd vi, vi, (-4.3)(-4.3)]]

Page 31: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [who, vi, [who, vi, (-3.5)(-3.5)]]

Page 32: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ResultResult:[Element1, Prefix1, ValueValue11]

[Element2, Prefix2, ValueValue22]

[Element3, Prefix3, ValueValue33]

[Element4, Prefix4, ValueValue44]

…[Elementn, Prefixn, ValueValuenn]

Each comparison (ClassTrie, TestingTrie):

A comparision value

.

Comparison Value

18

Classification – Comparing ProcessClassification – Comparing Process

Page 33: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ResultResult:[vi, pwd, + 5.1+ 5.1][who, pwd vi, - 4.3- 4.3][who, pwd, - 3.5- 3.5]

.

- 2.7

18

Classification – Comparing ProcessClassification – Comparing Process

Comparison Value

Page 34: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ClassificationClassificationpwdfsfg…

vimanls…

…fingermorels...

Sequence 1 Class 1

Sequence 2 Class 2

Sequence n Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vimorels…

Pattern to classify

Sequence to classify

ONLINE SEQUENCE

CLASS

On-Line Sequence

Classification

Compare_Patterns

Compare_Patterns

Compare_Patterns

comparision value

comparision value

comparision value

.

19

Page 35: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

comparision value

comparision value

comparision value

ClassificationClassificationpwdfsfg…

vimanls…

…fingermorels...

Sequence 1 Class 1

Sequence 2 Class 2

Sequence n Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vimorels…

Pattern to classify

Sequence to classify

Compare_Patterns

Compare_Patterns

Compare_Patterns

ONLINE SEQUENCE

CLASS

On-Line Sequence

Classification

Greatest Comparison

Value

.

20

Page 36: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target EnvironmentTarget Environment DescriptionDescription Experiments & Results

Conclusions and Future Works

21

OutlineOutline

Page 37: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Environment –Environment – UNIX command line sequences UNIX command line sequences

# Start session 1# Start session 1 cd ~/private/docs ls -laF | more cat foo.txt bar.txt zorch.txt > a.txt exit# End session 1# End session 1

# Start session 2 cd ~/games/ xquake & fg …

**SOF****SOF**cd<1>ls-laF|morecat<3>><1>exit**EOF****EOF**…

one "file name" argument

three "file name" arguments

one "file name" argument

Command histories of 9 UNIX computer usersUNIX computer users at over 2 yearsUCI Repository of ML Database [Newman C., Hettich S., Merz, C. (1998)]

22

Page 38: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target EnvironmentTarget Environment Description Experiments & ResultsExperiments & Results

Conclusions and Future Works

23

OutlineOutline

Page 39: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

Page 40: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

2. 2. Classification Algorithm:Classification Algorithm: Sequence to classify (sequences of very different sizes) Classified in the class with the greatest value (result value).

Page 41: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

2. 2. Classification Algorithm:Classification Algorithm: Sequence to classify (sequences of very different sizes) Classified in the class with the greatest value (result value).

3. 3. Evaluating the result:Evaluating the result: Calculate:

difference between the greatest value and the second greatest value (+)(+)x difference between the real classification value and the greatest value (-)(-)

(The greater the difference, the better the classification)

Page 42: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Results Results – – UNIX command line sequencesUNIX command line sequences

Unix Commands Classification – User 6

.

average of 25 simulation results

25

Cla

ssifi

catio

n Va

lue

Length of the Sequence to classify

Page 43: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Results Results – – UNIX command line sequencesUNIX command line sequences

Minimum length for classifying a UNIX Computer User correctly

.

26

Unix Computer User (Class)

Leng

th o

f the

Seq

uenc

e to

cla

ssify

Page 44: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future WorksConclusions and Future Works

27

OutlineOutline

Page 45: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

A threshold must be found

Long time for creating the tries

Results depend on the length of the sub-sequences used to create the trie

ConclusionsConclusions

28

Page 46: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Effective method to classify UNIX users

If a behavior can be represented by sequences, the proposed classification method can be used

If a new class is added, only its trie must be created (the others are not modified)

This method could be used for other tasks: sequence prediction, sequence clustering…RoboCup Coach 2006 Competition (succesfully results)

ConclusionsConclusions

29

Page 47: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Pattern Library One Trie for all classes (users).

Classification method without threshold value

Analysis comparing our approach to others (HMMs)

Future WorksFuture Works

30

Page 48: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

Thank you!Thank you!

Page 49: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

QuestionsQuestions

Page 50: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{ jiglesia, ledezma, masm}@inf.uc3m.es

.

Related to Questions...Related to Questions...

29

Page 51: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser1

Page 52: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 User On-Line vs Class User1 49 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser1

Page 53: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 User On-Line vs Class User1 49 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser1

Correctly ClassifiedCorrectly Classified

Page 54: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 User On-Line vs Class User1 49 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser1

Correctly ClassifiedCorrectly Classified

20

Page 55: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser2

Page 56: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 49User On-Line vs Class User2 User On-Line vs Class User2 9 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

ClassUser2

Page 57: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 User On-Line vs Class User1 49 49User On-Line vs Class User2 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

NO Correctly NO Correctly ClassifiedClassified

ClassUser2

Page 58: José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0Class0

USER 1Class1

USER 8Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line Є Class c

User On-Line vs Class User0 21User On-Line vs Class User1 User On-Line vs Class User1 49 49User On-Line vs Class User2 User On-Line vs Class User2 9 9User On-Line vs Class User3 3User On-Line vs Class User4 12User On-Line vs Class User5 29User On-Line vs Class User6 -1User On-Line vs Class User7 0User On-Line vs Class User8 11

NO Correctly NO Correctly ClassifiedClassified

- 40

ClassUser2