ieg201602 share prof murthy mn_pattern analysis and synthesis

Inaugural Key Note, Pattern Analysis and Synthesis

Prof M.N. Murty, Dean, Faculty of Engineering

and Professor, Dept. of Comp. Science and

Automation, IISc

Pattern Analysis and Synthesis

M. Narasimha MurtyProfessor, Dept. Of CSA

IISc, [email protected]

February 20, 2016

Flooding of DataLarger number of people are generating data than those who analyse!What happens if machines are aggressive in synthesis?

–Web, text, images …

–Business transactions, calls, ...

–Scientific data: astronomy, biology, etc

• More data is captured:–Storage technology faster and

cheaper

–DBMS can handle bigger DB

– Pattern Synthesis is useful when the data is small

Learning from examples (supervised learning; classification and regression)Learning from observations (unsupervised learning)

Man-Machine Interaction

Databases

Data Mining

Machine Learning

Pattern Recognition

Machine Learning and Data Mining

• Association Rule Mining • Sentiment Mining• Summarization• Question Answering

Representation of ObjectsAn instance is represented as a

• Vector in a multi-dimensional space:For example, (30,1) represents an object with 30 unitsof weight and 1 unit of height

– Statistical Recognition and Neural Nets– Fuzzy and Rough clustering

• String of characters/primitives:– Description in a logic; for example(color = red V white) (make = leather) (shape = sphere)– Conceptual, Knowledge-based, itemset based, symbolic and structural data mining

Representation is Crucial

INCOME (PM in Rs.) Class Label20000 Tall

40000 Short

60000 Tall

80000 Short

INCOME is 55000; What is the Class?

Bag of Words

x x x x x x

xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxx

xxxxxxxxxxxxx

xxxxxxxxxx

OOOOOOOOOOOO

OOOOOO

O OOOOOOO

OOOOOOOOOOO

OOOOOOOOOO

OOOOOO

PCA may not work

CLASS 2

CLASS 1

PC1

PC2

Used in• face recognition (eigenfaces, Pentland,

MIT) and • text recognition (Latent semantic

analysis, Papadimitriou, Univ. of Calif., Berkeley)

Large Datasets• Data is large; memory is small

• Solutions?– Incremental algorithms– Divide-and-conquer strategy– Compress the data and process

• These solutions are used in – Clustering– Classification– Association Rule Mining– Information Retrieval

Divide-and-Conquer

1. Divide the set into p blocks, each of size patterns.

2. Cluster patterns in each block into K clusters.

3. Collect the pK representatives and cluster them into K clusters.

4. Relabel the clusters obtained in step 2.

pn

What are the Characteristics?• What is the optimal number of blocks (P)?

# Patterns # Blocks # Clusters # Distances (1-level)

# Distns. (2-level)

100 2 5 4950 2495

1000 10 5 499,500 50,525

10000 100 5 4,99,95,000 619,750

• Find optimal number of blocks to optimize the number of distance computations.

• It is possible to show that the two-level algorithm gives a partition that is a constant factor approximation of the partition of one-level algorithm in terms of quality (fitness or accuracy).(References: Guha, Rastogi, Motwani, and O’Callaghan, Clustering Data Streams, FOCS 2000; Murty and Krishna, Efficient Technique for Data Clustering, Pattern Recognition, 1980.)

Classification

Learn a method for predicting the instance class from pre-labeled (classified) instances

Many approaches: KNNC, Decision Trees, Neural Networks, Adaboost, SVM...

Height

Weight

ChairHumanTest Pattern

Pattern Synthesis

Pin CushionPaper Weight

Pin Cushion – cum – Paper Weight

Given Objects: Pin Cushion and Paper Weight

• Functions of Pin Cushion do not interfere with the functionality of Paper weight • Usage pattern of the two types of objects indicates that they are geographically closely located; they are found on an office table.• Handy, heavy for paper weight and soft surface for pin cushion are functions

Pattern Synthesis in Classification• Generation of artificial patterns based on a given model or examples.

• Why is it needed?

• For high-dimensional data, to reduce the Curse of Dimensionality effect; the demand for a large number of training patterns grows with the dimensionality of the feature space.

• Getting real world training patterns is sometimes costly. Pattern synthesis can be useful here.

•In Numerical Taxonomy small size datasets were used. In such cases Bayesian Methods are useful.

• When we have large datasets frequentist approaches are fine.

NNC based on Bootstrapping• Consider a two-class training data

Negative class Positive class

(1 1 1 1), (2 2 2 2) (6 6 6 7), ( 7 7 7 6)(1 1 1 2), ( 2 2 2 1) (6 6 6 6), (7 7 7 7)(1 2 2 1), (1 2 1 1) (6 7 7 6), (6 7 6 6)

( 2 2 1 1), (1 1 2 1), (4 4 4 4) (7 7 6 6), (7 6 6 7), (3 3 3 3)

Negative (bootstrapped) Positive (K = 3)

(1 1.25 1.25 1.25), (1.75 2 1.75 1.25) (6.25 6.25 6 6.5), (6.75 7 6.75 6.25)

(1.25 1.75 1.75 1), (1.25 1.75 1.25 1) (6 6.5 6.25 6.25), (6.75 7 6.75 6.25)

(1 1.25 1.25 1.25), (1.75 2 1.75 1.25) (6.25 6.25 6.5 6), (6.25 6.75 6.5 6)

(1.5 1.75 1.25 1), (1 1.25 1.5 1.25),(2.25 2.5 2.5 2)

(6.5 6.75 6.25 6), (6.5 6.25 6.25 6.75)(5.25 5.5 5.25 5.5)

• It is observed based on experimental results that NNC on the bootstrapped data performs better.• This advantage comes from resampling the outliers.

Bootstrapping for Synthesis• Let the training data set be

X X X X X

X X X XX

X X X X

O O O O O

O O O O O

O O O O

O O O O

M

Dia

M’

NNC based on Divide-and-Conquer• Let the test pattern be (2 2 2 2) in a 4-dimensional space• Let the training data set be

1 - 1 1 1 12 + 6 6 6 63 + 7 7 7 7 4 - 1 1 2 25 - 1 2 2 26 + 7 7 6 67 - 2 2 2 18 + 6 6 7 7

• It is possible to exploit partial sums; squared euclidean distance • In the first block the subpattern of 7 is the NN and in the second block it is the subpattern of 4• This implies a synthesized NN (2 2 2 2) of the test pattern of class “–” is being considered; note that (2 2 2 2) is not in the input training data.

NNC with Partition based Synthetic PatternsNegative Class Positive Class

1 1 1 1 1 12 2 2 2 2 2

6 6 6 6 6 67 7 7 7 7 7

Synthesized Negative Synthesized Positive

1 1 1 1 1 1 6 6 6 6 6 6

1 1 1 1 2 2 6 6 6 6 7 7

1 1 2 2 1 1 6 6 7 7 6 6

1 1 2 2 2 2 6 6 7 7 7 7

2 2 1 1 1 1 7 7 6 6 6 6

2 2 1 1 2 2 7 7 6 6 7 7

2 2 2 2 1 12 2 2 2 2 2

7 7 7 7 6 67 7 7 7 7 7

• The NNs of the test pattern 1 1 2 2 1 1 from the synthesized data are:1 1 2 2 1 1 from Negative class and 6 6 6 6 6 6 from Positive class

NNC with Partition based Synthetic PatternsNegative Class Positive Class

1 1 1 1 1 12 2 2 2 2 2

6 6 6 6 6 67 7 7 7 7 7

Test Pattern1 1 2 2 1 1

BLOCK1 BLOCK2 BLOCK3 Class Label

1 1 2 2 1 1 Negative

6 6 6 6 6 6 Positive

• NN sub-pattern is found in each block from each class• The NN is found by combining the sub-patterns• Here, the NN is 1 1 2 2 1 1; so, the test pattern is labeled -

Zipf’s Law

Theoretical (Linguistic/structural or Statistical)• Nake, F., and Rosenfeld, A., (Ed.), Graphic Languages, North-Holland, 1972.• Stromoney, G.[Gift], Siromoney, R.[Rani], Krithivasan, K.[Kamala], Abstract families

of matrices and picture languages, CGIP(1), No. 3, November 1972, pp. 284-307.• R. Narasimhan: His work on syntactic pattern recognition, carried out when he was

spending a few years at Illinois, was seminal. He worked for over a decade on the modelling of natural language behaviour and on the evolution of language behaviour. He authored several widely read books.

• FACEBOOK: Brings structure into picture again; degree distribution in a social network follows power-law

• P.C. Mahalanobis in Kolkata on 17th December, 1931, Founded ISI. He is known for Mahalanobis distance, a weighted Euclidean distance.

• C. R. Rao was born on 10th September, 1920 at Hadagali in Karnataka. His family moved to Vishakapatnam in Andhra Pradesh. He received his master's degree in mathematics. He ranked first in the examination. He then went to Calcutta in search of a job. He didn't get the job but a chanced visit to the Indian Statistical Institute (ISI) changed his life.

Trends• Representation is crucial and we do not have good theories.• Combination of content and structure: Information networks• Graph Databases• Probabilistic Graphical Models: Integration of results from Graph

Theory • Approximate algorithms: Input and output are abstractions• When the prior is a power-law distribution, why use Dirichlet

prior?• How to capture the domain knowledge? It may be useful in

getting a good representation.

Progress Information Excellence Towards an Enriched Profession, Business and Society

Progress Information Excellence Towards an Enriched Profession, Business and Society

Community Focused

Volunteer Driven

Knowledge Share

Accelerated Learning

Collective Excellence

Distilled Knowledge

Shared, Non Conflicting Goals

Validation / Brainstorm platform

Mentor, Guide, Coach

Satisfied, Empowered Professional

Richer Industry and Academia

About Information Excellence Group

Progress Information Excellence

Towards an Enriched Profession, Business and Society

About Information Excellence Group

Reach us at:

blog: http://informationexcellence.wordpress.com/

linked in: http://www.linkedin.com/groups/Information-Excellence-3893869

Facebook: http://www.facebook.com/pages/Information-excellence-group/171892096247159

presentations: http://www.slideshare.net/informationexcellence

twitter: #infoexcel

email: [email protected]@gmail.com

Have you enriched yourself by contributing to the community Knowledge Share..

ieg201602 share prof murthy mn_pattern analysis and synthesis

Technology