ieg201602 share prof murthy mn_pattern analysis and synthesis
TRANSCRIPT
Inaugural Key Note, Pattern Analysis and Synthesis
Prof M.N. Murty, Dean, Faculty of Engineering
and Professor, Dept. of Comp. Science and
Automation, IISc
Pattern Analysis and Synthesis
M. Narasimha MurtyProfessor, Dept. Of CSA
IISc, [email protected]
February 20, 2016
Flooding of DataLarger number of people are generating data than those who analyse!What happens if machines are aggressive in synthesis?
–Web, text, images …
–Business transactions, calls, ...
–Scientific data: astronomy, biology, etc
• More data is captured:–Storage technology faster and
cheaper
–DBMS can handle bigger DB
– Pattern Synthesis is useful when the data is small
Learning from examples (supervised learning; classification and regression)Learning from observations (unsupervised learning)
Man-Machine Interaction
Databases
Data Mining
Machine Learning
Pattern Recognition
Machine Learning and Data Mining
• Association Rule Mining • Sentiment Mining• Summarization• Question Answering
Representation of ObjectsAn instance is represented as a
• Vector in a multi-dimensional space:For example, (30,1) represents an object with 30 unitsof weight and 1 unit of height
– Statistical Recognition and Neural Nets– Fuzzy and Rough clustering
• String of characters/primitives:– Description in a logic; for example(color = red V white) (make = leather) (shape = sphere)– Conceptual, Knowledge-based, itemset based, symbolic and structural data mining
Representation is Crucial
INCOME (PM in Rs.) Class Label20000 Tall
40000 Short
60000 Tall
80000 Short
INCOME is 55000; What is the Class?
Bag of Words
x x x x x x
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
xxxxxxxxxxxxx
xxxxxxxxxx
OOOOOOOOOOOO
OOOOOO
O OOOOOOO
OOOOOOOOOOO
OOOOOOOOOO
OOOOOO
PCA may not work
CLASS 2
CLASS 1
PC1
PC2
Used in• face recognition (eigenfaces, Pentland,
MIT) and • text recognition (Latent semantic
analysis, Papadimitriou, Univ. of Calif., Berkeley)
Large Datasets• Data is large; memory is small
• Solutions?– Incremental algorithms– Divide-and-conquer strategy– Compress the data and process
• These solutions are used in – Clustering– Classification– Association Rule Mining– Information Retrieval
Divide-and-Conquer
1. Divide the set into p blocks, each of size patterns.
2. Cluster patterns in each block into K clusters.
3. Collect the pK representatives and cluster them into K clusters.
4. Relabel the clusters obtained in step 2.
pn
What are the Characteristics?• What is the optimal number of blocks (P)?
# Patterns # Blocks # Clusters # Distances (1-level)
# Distns. (2-level)
100 2 5 4950 2495
1000 10 5 499,500 50,525
10000 100 5 4,99,95,000 619,750
• Find optimal number of blocks to optimize the number of distance computations.
• It is possible to show that the two-level algorithm gives a partition that is a constant factor approximation of the partition of one-level algorithm in terms of quality (fitness or accuracy).(References: Guha, Rastogi, Motwani, and O’Callaghan, Clustering Data Streams, FOCS 2000; Murty and Krishna, Efficient Technique for Data Clustering, Pattern Recognition, 1980.)
Classification
Learn a method for predicting the instance class from pre-labeled (classified) instances
Many approaches: KNNC, Decision Trees, Neural Networks, Adaboost, SVM...
Height
Weight
ChairHumanTest Pattern
Pattern Synthesis
Pin CushionPaper Weight
Pin Cushion – cum – Paper Weight
Given Objects: Pin Cushion and Paper Weight
• Functions of Pin Cushion do not interfere with the functionality of Paper weight • Usage pattern of the two types of objects indicates that they are geographically closely located; they are found on an office table.• Handy, heavy for paper weight and soft surface for pin cushion are functions
Pattern Synthesis in Classification• Generation of artificial patterns based on a given model or examples.
• Why is it needed?
• For high-dimensional data, to reduce the Curse of Dimensionality effect; the demand for a large number of training patterns grows with the dimensionality of the feature space.
• Getting real world training patterns is sometimes costly. Pattern synthesis can be useful here.
•In Numerical Taxonomy small size datasets were used. In such cases Bayesian Methods are useful.
• When we have large datasets frequentist approaches are fine.
NNC based on Bootstrapping• Consider a two-class training data
Negative class Positive class
(1 1 1 1), (2 2 2 2) (6 6 6 7), ( 7 7 7 6)(1 1 1 2), ( 2 2 2 1) (6 6 6 6), (7 7 7 7)(1 2 2 1), (1 2 1 1) (6 7 7 6), (6 7 6 6)
( 2 2 1 1), (1 1 2 1), (4 4 4 4) (7 7 6 6), (7 6 6 7), (3 3 3 3)
Negative (bootstrapped) Positive (K = 3)
(1 1.25 1.25 1.25), (1.75 2 1.75 1.25) (6.25 6.25 6 6.5), (6.75 7 6.75 6.25)
(1.25 1.75 1.75 1), (1.25 1.75 1.25 1) (6 6.5 6.25 6.25), (6.75 7 6.75 6.25)
(1 1.25 1.25 1.25), (1.75 2 1.75 1.25) (6.25 6.25 6.5 6), (6.25 6.75 6.5 6)
(1.5 1.75 1.25 1), (1 1.25 1.5 1.25),(2.25 2.5 2.5 2)
(6.5 6.75 6.25 6), (6.5 6.25 6.25 6.75)(5.25 5.5 5.25 5.5)
• It is observed based on experimental results that NNC on the bootstrapped data performs better.• This advantage comes from resampling the outliers.
Bootstrapping for Synthesis• Let the training data set be
X X X X X
X X X XX
X X X X
O O O O O
O O O O O
O O O O
O O O O
M
Dia
M’
NNC based on Divide-and-Conquer• Let the test pattern be (2 2 2 2) in a 4-dimensional space• Let the training data set be
1 - 1 1 1 12 + 6 6 6 63 + 7 7 7 7 4 - 1 1 2 25 - 1 2 2 26 + 7 7 6 67 - 2 2 2 18 + 6 6 7 7
• It is possible to exploit partial sums; squared euclidean distance • In the first block the subpattern of 7 is the NN and in the second block it is the subpattern of 4• This implies a synthesized NN (2 2 2 2) of the test pattern of class “–” is being considered; note that (2 2 2 2) is not in the input training data.
NNC with Partition based Synthetic PatternsNegative Class Positive Class
1 1 1 1 1 12 2 2 2 2 2
6 6 6 6 6 67 7 7 7 7 7
Synthesized Negative Synthesized Positive
1 1 1 1 1 1 6 6 6 6 6 6
1 1 1 1 2 2 6 6 6 6 7 7
1 1 2 2 1 1 6 6 7 7 6 6
1 1 2 2 2 2 6 6 7 7 7 7
2 2 1 1 1 1 7 7 6 6 6 6
2 2 1 1 2 2 7 7 6 6 7 7
2 2 2 2 1 12 2 2 2 2 2
7 7 7 7 6 67 7 7 7 7 7
• The NNs of the test pattern 1 1 2 2 1 1 from the synthesized data are:1 1 2 2 1 1 from Negative class and 6 6 6 6 6 6 from Positive class
NNC with Partition based Synthetic PatternsNegative Class Positive Class
1 1 1 1 1 12 2 2 2 2 2
6 6 6 6 6 67 7 7 7 7 7
Test Pattern1 1 2 2 1 1
BLOCK1 BLOCK2 BLOCK3 Class Label
1 1 2 2 1 1 Negative
6 6 6 6 6 6 Positive
• NN sub-pattern is found in each block from each class• The NN is found by combining the sub-patterns• Here, the NN is 1 1 2 2 1 1; so, the test pattern is labeled -
Zipf’s Law
Theoretical (Linguistic/structural or Statistical)• Nake, F., and Rosenfeld, A., (Ed.), Graphic Languages, North-Holland, 1972.• Stromoney, G.[Gift], Siromoney, R.[Rani], Krithivasan, K.[Kamala], Abstract families
of matrices and picture languages, CGIP(1), No. 3, November 1972, pp. 284-307.• R. Narasimhan: His work on syntactic pattern recognition, carried out when he was
spending a few years at Illinois, was seminal. He worked for over a decade on the modelling of natural language behaviour and on the evolution of language behaviour. He authored several widely read books.
• FACEBOOK: Brings structure into picture again; degree distribution in a social network follows power-law
• P.C. Mahalanobis in Kolkata on 17th December, 1931, Founded ISI. He is known for Mahalanobis distance, a weighted Euclidean distance.
• C. R. Rao was born on 10th September, 1920 at Hadagali in Karnataka. His family moved to Vishakapatnam in Andhra Pradesh. He received his master's degree in mathematics. He ranked first in the examination. He then went to Calcutta in search of a job. He didn't get the job but a chanced visit to the Indian Statistical Institute (ISI) changed his life.
Trends• Representation is crucial and we do not have good theories.• Combination of content and structure: Information networks• Graph Databases• Probabilistic Graphical Models: Integration of results from Graph
Theory • Approximate algorithms: Input and output are abstractions• When the prior is a power-law distribution, why use Dirichlet
prior?• How to capture the domain knowledge? It may be useful in
getting a good representation.
Progress Information Excellence Towards an Enriched Profession, Business and Society
Progress Information Excellence Towards an Enriched Profession, Business and Society
Community Focused
Volunteer Driven
Knowledge Share
Accelerated Learning
Collective Excellence
Distilled Knowledge
Shared, Non Conflicting Goals
Validation / Brainstorm platform
Mentor, Guide, Coach
Satisfied, Empowered Professional
Richer Industry and Academia
About Information Excellence Group
Progress Information Excellence
Towards an Enriched Profession, Business and Society
About Information Excellence Group
Reach us at:
blog: http://informationexcellence.wordpress.com/
linked in: http://www.linkedin.com/groups/Information-Excellence-3893869
Facebook: http://www.facebook.com/pages/Information-excellence-group/171892096247159
presentations: http://www.slideshare.net/informationexcellence
twitter: #infoexcel
email: [email protected]@gmail.com
Have you enriched yourself by contributing to the community Knowledge Share..