interval pattern structures: an introdution

32
Building Concept Lattices from Numerical Data with Pattern Structures A tutorial Mehdi Kaytoue and Amedeo Napoli [email protected] November, 9th 2010

Upload: insa-de-lyon

Post on 29-Nov-2014

250 views

Category:

Education


0 download

DESCRIPTION

This is a simple tutorial on "interval pattern structures", a conceptual structure to be derived from numerical data within Formal Concept Analysis.

TRANSCRIPT

Page 1: Interval Pattern Structures: An introdution

Building Concept Lattices fromNumerical Data with Pattern Structures

A tutorial

Mehdi Kaytoue and Amedeo Napoli

[email protected]

November, 9th 2010

Page 2: Interval Pattern Structures: An introdution

Context

Formal Concept AnalysisWorks on binary relationsClassification of objects w.r.t. the attributes they have incommon within formal concepts (extent, intent)Ordering concepts gives a mathematical structure

Ganter & Wille, Springer mathematical foundations 99

Concept lattice : useful for many tasksSimultaneous classification of objects and their attributesInformation organizationKnowledge discovery in databases (closed itemsets,associations rules)Information retrieval...

Valtchev & al., ICFCA 04 – Wille, JETAI 02

2 / 32

Page 3: Interval Pattern Structures: An introdution

Problem and proposition

When facing numerical data ?

Transform data into binary, a general problemConceptual scaling (binarization)Important choices to be madeLoss of information, of links between objects

Avoiding binarization ? Considering a similarity relationbetween values ?

3 / 32

Page 4: Interval Pattern Structures: An introdution

Outline

1 Formal Concept Analysis

2 Pattern structures and intervals

3 Introducing a similarity relation

4 Conclusion

Page 5: Interval Pattern Structures: An introdution

Formal context

Given by (G,M, I) withG a set of objectsM a set of attributesI a binary relation between objects and attributes :(g,m) ∈ I means that “object g owns attribute m”

Represented by a binary table

m1 m2 m3

g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×

G = {g1, . . . , g5}M = {m1,m2,m3}(g1,m3) ∈ I

5 / 32

Page 6: Interval Pattern Structures: An introdution

Galois connection

Two derivation operators forming a Galois connectionGives the set of common attributes owned by a set ofobjects A ⊆ G

A′ = {m ∈ M | ∀g ∈ A ⊆ G : (g,m) ∈ I}

Gives the set of objects owning all attributes in B ⊆ M

B′ = {g ∈ G | ∀m ∈ B ⊆ M : (g,m) ∈ I}

6 / 32

Page 7: Interval Pattern Structures: An introdution

Formal concepts

Given by (A,B), withwith A′ = B and B′ = AA is the concept extentB is the concept intent

Illustration

{g1}′ = {m1,m3}

{m1,m3}′ = {g1,g5}

{g1,g5}′ = {m1,m3}

m1 m2 m3

g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×

({g1,g5}, {m1,m3}) is a formal concept

7 / 32

Page 8: Interval Pattern Structures: An introdution

Concept lattice

Ordering relation on concepts

(A1,B1) ≤ (A2,B2)⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)

({g1,g5}, {m1,m3}) ≤ ({g1,g2,g5}, {m1})

Concept lattices have interesting propertiesMaximalitySpecialization/generalisation hierarchySynthetic representation of the data without loss ofinformation

8 / 32

Page 9: Interval Pattern Structures: An introdution

Problem

How to build a lattice from numerical data ?

How to consider “similar” objects in concepts ?

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

while avoiding discretization and associated problems(thresholds, size of binary table, information loss,calculability, interpretation)

9 / 32

Page 10: Interval Pattern Structures: An introdution

Outline

1 Formal Concept Analysis

2 Pattern structures and intervals

3 Introducing a similarity relation

4 Conclusion

Page 11: Interval Pattern Structures: An introdution

First elements

11 / 32

Page 12: Interval Pattern Structures: An introdution

How to order object descriptions

Classical caseLattice of attributes (2M ,⊆). With N,O ∈ 2M , one has

N ⊆ O ⇐⇒ N ∩O = N

For example, with M = {a,b}

{a} ⊆ {a,b} ⇐⇒ {a} ∩ {a,b} = {a}

Pattern case∩ has the properties of a meet u in a semi lattice

A “similarity operator” that gives a description handlingthe similarity of its arguments

{a,b} ∩ {a,d} = {a}

12 / 32

Page 13: Interval Pattern Structures: An introdution

Pattern structures

Given by (G, (D,u), δ)G a set of objectsD a meet semi-lattice of object descriptions called patternsδ a mapping associating to each object g ∈ G itsdescription δ(g) ∈ D

Patterns from (D,u) are ordered by

c v d ⇐⇒ c u d = c ∀c,d ∈ D

A Galois connection between (2G,⊆) and (D,v) gives rise toa (pattern) concept lattice

Existing algorithms of FCA (based on closure computation) caneasily be adapted

Ganter & Kuznetsov, ICCS01

13 / 32

Page 14: Interval Pattern Structures: An introdution

Intervals are patterns

Let be [a1,b1] and [a2,b2] two intervals

Their meet is[a1,b1] u [a2,b2] = [min(a1,a2),max(b1,b2)]

[4,4] u [5,5] = [4,5]

Their order is given by

[a1,b1] v [a2,b2] ⇐⇒ [a1,b1] u [a2,b2] = [a1,b1][4,5] v [5,5] ⇐⇒ [4,5] u [5,5] = [4,5]

Semi lattice (D,u), or (D,v)

14 / 32

Page 15: Interval Pattern Structures: An introdution

Interval vectors are patterns

Given the two following interval vectors

e = 〈[ai ,bi ]〉i∈[1,p] et f = 〈[ci ,di ]〉i∈[1,p]

Their meet is

e u f = 〈[ai ,bi ] u [ci ,di ]〉i∈[1,p]

〈[4,4], [3,4]〉 u 〈[2,3], [2,6]〉 = 〈[2,4], [2,6]〉

Their order is given by

e v f ⇔ [ai ,bi ] v [ci ,di ], ∀i ∈ [1,p]

〈[2,4], [2,6]〉 v 〈[4,4], [3,4]〉 car [2,4] v [4,4] et [2,6] v [3,4]

15 / 32

Page 16: Interval Pattern Structures: An introdution

Galois connection

Two operators

Gives the description representing similarity of a set ofobjects

A� =l

g∈A

δ(g) pour A ⊆ G

Gives the maximal set of objects sharing a givendescription

d� = {g ∈ G|d v δ(g)} pour d ∈ (D,u)

Ganter & Kuznetsov, ICCS01

16 / 32

Page 17: Interval Pattern Structures: An introdution

Numerical data are pattern structures

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

{g1,g2}� =l

g∈{g1,g2}

δ(g) = δ(g1) u δ(g2)

= 〈[5,5], [7,7], [6,6]〉 u 〈[6,6], [8,8], [4,4]〉= 〈[5,5] u [6,6], [7,7] u [8,8], [6,6] u [4,4]〉= 〈[5,6], [7,8], [4,6]〉

17 / 32

Page 18: Interval Pattern Structures: An introdution

Numerical data are pattern structures

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

〈[5,6], [7,8], [4,6]〉� = {g ∈ G|〈[5,6], [7,8], [4,6]〉 v δ(g)}= {g1,g2,g5}

({g1,g2,g5}, 〈[5,6], [7,8], [4,6]〉) is a concept

18 / 32

Page 19: Interval Pattern Structures: An introdution

Semantics in R|M|

Patterns are |M|-hyperrectanglesOrdering of patterns corresponds to rectangle inclusion

19 / 32

Page 20: Interval Pattern Structures: An introdution

General problem

Lowest concepts : few objects, small intervalsHighest concepts : many objects, large intervalsOverwhelming : a single concept for each interval of value

20 / 32

Page 21: Interval Pattern Structures: An introdution

1 Formal Concept Analysis

2 Pattern structures and intervals

3 Introducing a similarity relation

4 Conclusion

Page 22: Interval Pattern Structures: An introdution

Introducing a similarity relation between objects

How to group within the same concept objects havingsimilar values ?

A simple similarity relation

a 'θ b ⇔ |a− b| ≤ θ

Examples

2 '2 4, 2 6'3 7

22 / 32

Page 23: Interval Pattern Structures: An introdution

The meet : a similarity operator

Given two objects g and htheir descriptions are respectively δ(g) et δ(h)the similarity between g et h is represented byδ({g,h}) = δ(g) u δ(h)

For any arbitrary set of objectsevery objects are similar (since we can compute u)their level of similarity depends of the level of the meet oftheir description in the semi-lattice

How to consider a similarity relation w.r.t. a distance ?23 / 32

Page 24: Interval Pattern Structures: An introdution

Towards a similarity between objects

Introduce an element ∗ ∈ (D,u) denoting dissimilarity

c u d 6= ∗ ⇐⇒ c and d are similar

c u d = ∗ ⇐⇒ c are d are not similar

For intervals, the meet is constrained by a threshold θ

[a,b]uθ[c,d ] = [min(a, c),max(b,d)] if max(b,d)−min(a, c) ≤ θ[a,b] uθ [c,d ] = ∗ otherwise

with θ = 0.2

Actually, we just “cut” the semi-lattice24 / 32

Page 25: Interval Pattern Structures: An introdution

Going further ?

'θ is not a transitive relation, i.e. a tolerance relation

Projecting each pattern d ∈ D, i.e. ψ(d) v dFor each dimension, replacing each value with a largerintervalFrom a value d and its attribute domain

“Dilatation” : ball of patterns of radius θ (similarity)“Erosion” : delete pairs of values violating the similarity(maximality)Computing the meet of remaining values

Projection can be computing as preprocessing

Each projected pattern determines an equivalence class ofsimilar values : it reduces the number of concepts.

25 / 32

Page 26: Interval Pattern Structures: An introdution

Projecting for changing lattice granularity

FIGURE : Classical case (no projection)

26 / 32

Page 27: Interval Pattern Structures: An introdution

Projecting for changing lattice granularity

FIGURE : With θ = 0

27 / 32

Page 28: Interval Pattern Structures: An introdution

Projecting for changing lattice granularity

FIGURE : With θ = 1

28 / 32

Page 29: Interval Pattern Structures: An introdution

Projecting for changing lattice granularity

FIGURE : With θ = 2

29 / 32

Page 30: Interval Pattern Structures: An introdution

1 Formal Concept Analysis

2 Pattern structures and intervals

3 Introducing a similarity relation

4 Conclusion

Page 31: Interval Pattern Structures: An introdution

Conclusion

In a few wordsPattern structures for numerical dataIntroducing a similarity relation

Other worksLinks between binarization and projection of patternsAlgorithms for interval pattern structuresInterval dataMining closed interval patterns and their generatorsEnhancing information fusion with pattern structuresMining bi-sets in numerical data

ApplicationsGene expression data analysisFarmer practices evaluationRecommendation systems (movielens)

31 / 32

Page 32: Interval Pattern Structures: An introdution

Some references

M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. Napoli. Mining Gene Expression Data with PatternStructures in Formal Concept Analysis. In Information Sciences. Spec.Iss. : Lattices, 2010.

M. Kaytoue, Z. Assaghir, A. Napoli, and S. O. Kuznetsov. Embedding tolerance relations in Formal ConceptAnalysis for classifying numerical data In 19th Conference on Information and Knowledge Management(CIKM), 2010.

Z. Assaghir, M. Kaytoue, A. Napoli, and H. Prade. Managing Information Fusion with Formal ConceptAnalysis. In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010.

B. Ganter et S. O. Kuznetsov. Pattern Structures and Their Projections. In International Conference onConceptual Structures, LNCS (2120), Springer, 2001

M. Kaytoue, S. Duplessis, S. O. Kuznetsov, et A. Napoli. Two FCA-Based Methods for Mining GeneExpression Data. In Formal Concept Analysis, LNCS (5548), Springer, pages 251–266, 2009.

M. Kaytoue, Z. Assaghir, N. Messai, et A. Napoli. Two Complementary Classification Methods for Designinga Concept Lattice from Interval Data. In Foundations of Information and Knowledge Systems, LNCS (5956),Springer, pages 345–362, 2010.

M. Kaytoue, S. Duplessis, and A. Napoli. Toward the Discovery of Itemsets with Significant Variations inGene Expression Matrices. In Studies in Classification, Data Analysis, and Knowledge Organization,Springer, 2010.

32 / 32