(in)formal concept analysis

48
Lecture Notes : (In)Formal concept analysis 30/03/2009 Formal Concept Analysis Prof. Kim Mens Louvain School of Engineering Department of Computing Science and Engineering UCL http://www.info.ucl.ac.be/~km (In)

Upload: kimmens

Post on 08-May-2015

1.426 views

Category:

Technology


0 download

DESCRIPTION

An informal and intuitive explanation of formal concept analyis

TRANSCRIPT

Page 1: (In)Formal Concept Analysis

Lecture Notes : (In)Formal concept analysis 30/03/2009

Formal Concept AnalysisProf. Kim Mens

Louvain School of EngineeringDepartment of Computing Science and Engineering

UCL

http://www.info.ucl.ac.be/~km

(In)

Page 2: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Information explosion

IT advances in the last decade(s) have caused an explosion of information

E.g., growth of the internet

This leads to a real information overload

How to manage (i.e., search, structure) all that information?

2

Page 3: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

(Small) example

Dataset = someone’s iTunes™ music library

≥ 5000 songs each having a name, artist, rating, genre, ...

How to manage all that data

How to find a song we like?

Can we find interesting relations between songs?

which songs are similar?

in what way are they similar?

3

Page 4: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Managing large data sets

Given a data set with many thousands of elements:

web pages, text or other documents

data libraries (books, songs, movies, ...)

customer and personnel databases

having certain properties:

indexes, relevant keywords, tags, genres, ...

In general ...

4

Page 5: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Managing large data sets

Given a data set with many thousands of elements:

web pages, text or other documents

data libraries (books, songs, movies, ...)

customer and personnel databases

Questions

1. How to find relevant data?

2. How to discover (hidden) structure in that data?

5

Page 6: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example (revisited)

Songs Genres

6

Page 7: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example

How to manage all those songs?

Three concrete applications

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• e.g., what songs does she like most

7

Page 8: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance

8

Page 9: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance

Search results [ party, dance ] :

• Technologic – Daft Punk• Whole Again - Atomic Kitten• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar

Refine search by genres :

• [ slow, pop, soft ]• [ beat ]

Remove genres from search :

• party• dance

9

Page 10: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance beat

Search results [ party, dance, beat ] :

• Technologic – Daft Punk• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar

Refine search by genres :

• [ electronic ]• [ reggae ]

Remove genres from search :

• party• dance• beat

10

Page 11: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance beat reggae

Search results [ party, dance, beat, reggae ] :

• Get Busy - Sean Paul

Remove genres from search :

• party• dance• beat• reggae

11

Page 12: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party reggae

Search results [ party, reggae ] :

• Could You Be Loved – Bob Marley

Refine search by genres :

• [ dance, beat ]

Remove genres from search :

• party• reggae

12

Page 13: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example

How to manage all those songs?

Three concrete applications:

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• what songs does she like most

13

Page 14: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Structure of the world-wide music scene

http://sixdegrees.hu/last.fm/index.html

?14

Page 15: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Dependencies between genres

New wave is so eighties

Dance music is party music

Disco is from the seventies

Classical music and slows are for softies

...

15

Page 16: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example

How to manage all those songs?

Three concrete applications:

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• what songs does she like most

16

Page 17: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Discover a user profile

To analyse the preferred genres of a user

for match-making or publicity purposes

For example,

most of her music is party music

she likes background music

she’s not such a big fan of classical

none of her music is hard

17

Page 18: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example

How to manage all those songs?

Three concrete applications:

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• what songs does she like most

So how can we

achieve all this?

18

Page 19: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Formal concept analysis...

... may be of help

FCA was invented around 1980 in Darmstadt as a mathematical theory for modelling the notion of a “concept”

Since then it has been applied in many domains of computer science dealing with large data sets

data analysis

knowledge discovery

software engineering

19

Page 20: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Data set is represented by a “context”

Objects Attributes

Relation

20

Page 21: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Formal concept analysis...

Starts from a context C

a set G of objects

a set M of attributes

a relation I between the objects and the attributes

Determines concepts

Maximal groups of objects and attributes

Plus hierarchical relationships

Subset relationships between those groups21

Page 22: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A “concept” represents a group of related objects and attributes

Intuitively, we look for maximal “rectangles” in the binary relation I

22

Page 23: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A conceptAlice - Sisters of Mercy

A Forest - The Cure

New Wave Party Eighties

Objects Attributes

A concept is a maximal group of objects and attributes

Group:

Every object of the concept has those attributes

Every attribute of the concept holds for those objects

Maximal

No other object (outside the concept) has those same attributes

No other attribute (outside the concept) is shared by these objects23

Page 24: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Not a concept

Need to include thisNeed to include this as well

Intuitively, we look for maximal “rectangles” in the binary relation I

24

Page 25: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Formal concept analysis...

... derives hierarchies of concepts from data sets

It generates and visualizes hierarchies of concepts on a mathematically founded basis

FCA

25

Page 26: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A concept hierarchy

26

Page 27: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Yet another concept

27

Page 28: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A subconcept

The blue concept is a subconcept of the green one.

28

Page 29: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A subconcept

is subconcept of

TechnologicIn Da ClubGet Busy

Destination CalabriaRock This Party

Party Dance Beat

Party Electronic Dance BeatTechnologic

Destination CalabriaRock This Party

is subset of is subset of

29

Page 30: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Concept lattice

For a given context, the set of all formal concepts, together with the partial order “is subconcept of” form a lattice

A lattice is a mathematical structure with some interesting properties:

for any two concepts there is always a greatest common subconcept and a least common superconcept

it is even a complete lattice, i.e. a unique top (least common superconcept) and bottom element (greatest common subconcept) exist

30

Page 31: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A concept lattice

31

Page 32: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A concept lattice

Alice – Sisters of Mercy

Forest – The Cure

New Wave Party Eighties

TechnologicIn Da ClubGet Busy

Destination CalabriaRock This Party

Party Dance Beat

Party Electronic Dance BeatTechnologic

Destination CalabriaRock This Party

is su

bcon

cept

of

32

Page 33: (In)Formal Concept Analysis

Tool support : Concept Explorer

Page 34: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A concept lattice in detail(sparse labelling)

34

Page 35: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example revisitedHow does it work?

How to manage all those songs?

Three concrete applications

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• e.g., what songs does she like most

35

Page 36: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance

Search results [ party, dance ] :

• Technologic – Daft Punk• Whole Again - Atomic Kitten• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar

Refine search by genres :

• [ slow, pop, soft ]• [ beat ]

Remove genres from search :

• party• dance

36

Page 37: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance beat

Search results [ party, dance, beat ] :

• Technologic – Daft Punk• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar

Refine search by genres :

• [ electronic ]• [ reggae ]

Remove genres from search :

• party• dance• beat

37

Page 38: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party dance beat reggae

Search results [ party, dance, beat, reggae ] :

• Get Busy - Sean Paul

Refine search by genres :

• [ electronic ]• [ reggae ]

Remove genres from search :

• party• dance• beat

38

Page 39: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

A Google-like search engine for songsGalois

Genres (separated by spaces) :

search

party reggae

Search results [ party, reggae ] :

• Could You Be Loved – Bob Marley

Refine search by genres :

• [ dance, beat ]

Remove genres from search :

• party• reggae

39

Page 40: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example revisitedHow does it work?

How to manage all those songs?

Three concrete applications

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• e.g., what songs does she like most

40

Page 41: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Implications

New wave is from the eighties

Dance music is party music

Disco is from the seventies

Slows are soft

Classical music is soft

41

Page 42: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Implications

Slows are soft

Classical music is soft

Disco is from the seventies

Dance music is party music

New wave is from the eighties

42

Page 43: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Associations

Most dance music has a beat

Most of her music is party music

A lot of music from the eighties is party music

43

Page 44: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Running example revisitedHow does it work?

How to manage all those songs?

Three concrete applications

1. Finding a song based on its genre

2. Discover (un)expected dependencies between genres

• as well as absence of expected dependencies

3. Discover a user profile

• e.g., what songs does she like most

44

Page 45: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Concept lattice(with number of objects)

Preferred music is party music

Also likes some background music

Not such a big fan of classical

and so on ...

45

Page 46: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Some problems...

Concept lattice can get very dense for large data sets

Concept lattice can grow exponential in size of context

Attributes are not always binary

What if data is incomplete or imprecise

False positives and negatives

...

(Some solutions have been proposed to overcome these problems)

46

Page 47: (In)Formal Concept Analysis

/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium

Conclusion

FCA is an interesting technique to analyse large data sets

especially to discover interesting concepts, relations and structures in the data

Can be applied to many application domains

Based on a formal mathematical theory

Yet easy to use and understand intuitively

Quality of results depends on size and quality of the data

47

Page 48: (In)Formal Concept Analysis

Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium / 48

SourcesB. Ganter, R . Wille: Formal Concept Analysis –Mathematical Foundations. Springer, Heidelberg 1999

Uta Priss’ Formal Concept Analysis Homepage

http://www.upriss.org.uk/fca/fca.html

Gerd Stumme’s course “Formale Begriffsanalyse”

http://www.kde.cs.uni-kassel.de/lehre/ss2005/formale_begriffsanalyse

Context Explorer (ConExp)

http://conexp.sourceforge.net/

J. Fallon: Application des treillis de Galois à la recherche d’informations. Master’s thesis, Université catholique de Louvain, Département d’Ingénierie Informatique, 2004

48