« Classification de graphesde connectivité du cerveau »
Romain Chion
encadré par: S. Achard, M. Desvignes, F. Forbes
gipsa-lab
SOMMAIRE
PRESENTATION DU CONTEXTE
METHODES USUELLES
NOUVELLE METHODE
COMPARAISON DES RESULTATS
gipsa-lab
CONTEXTE
METHODES
3
CONTEXTE
• Comment comparer les graphes entre eux?
• Est il possible de modéliser les graphes de connectivité cérébrale (GCC)?
• A quel point peut-on catégoriser les GCC?
gipsa-lab
CONTEXTE
METHODES
4
MODELES GENERATIFS
Illustration « Small World », Collective dynamics of‘small-world’ networks, D. J. Watts & S. H. Strogatz
Illustration « Preferential Attachement », Choice-driven phasetransition in complex networks, P. L. Krapivsky and S. Redner
• Erdos-Renyi
• Forest Fire
• Kronecker
• Preferential Attachment
• Random k-regular
• Random Power Law
• Random Typing
• Small-World
gipsa-lab
CONTEXTE
METHODES
5
COMPARAISON DE GRAPHES
• Tansformation d’un graphe vers un autreex : Distance d’édition
MESURES STRUCTURELLES
• Tendance des nœuds à se regrouper, distribution des degrés, chemins entre nœudsex : Clustering, Plus Court Chemin
MESURES LOCALES
(pour chaque nœud)
• Mesures locales moyennées, formation de noyaux et de communautésex : Assortativité, Centralité, Modularité, Diamètre
MESURES GLOBALES
gipsa-lab
METHODES
APPORTS
6
Comptage de Graphlets
Ensemble d’apprentissage
Instance de graphe
Nombre de Graphlets
Classifieur
Modèle de graphe
entrée du classifieurapprentissage du classifieur
ETAT DE L’ART : JANSSEN et al. 2012
Nombre de Graphlets
gipsa-lab
METHODES
APPORTS
7
ETAT DE L’ART : MOTALLEBI et al. 2013
Classifieurde Réseaux Complexes
gipsa-lab
METHODES
APPORTS
8
MODELISATION DES GCC
Caractérisation des GCC vers 4 modèles(Erdos-Renyi, Preferential Attachement, Random k-regular, Small-World)
Classe Prédiction E-R P A R k-R S-W
Control Small-World 0.2502 0.2501 0.2492 0.2505
Patient Small-World 0.2502 0.2501 0.2492 0.2505
Résultat de la caractérisation avec mesures globales et classifieur SVM
Intervalle deconfiance ~25%
gipsa-lab
METHODES
APPORTS
9
IDENTIFICATION DES GCC
true Control true Patient class precisionpred. Control 13 11 54.17%pred. Patient 7 6 46.15%
class recall 65.00% 35.29% 50.16%
Résultat de l’identification avec mesures globales et classifieur SVM
Précision de la méthode à 50.16%, aléatoire à 50%
gipsa-lab
METHODES
APPORTS
PROBLEMATIQUE
« Les mesures globales ne sont pas représentatives du comportement local »
Histogrammes du coefficient de clustering local pour 3 modèles
10
gipsa-lab
APPORTS
RESULTATS
HISTOGRAMME NORMALISE
11
• Clustering Coefficient
• Characteristic Path Length
• Degrees Distribution
• Efficiency
Ensemble d’apprentissage
Instance de réseau
Histogrammes normalisés moyens
Distances entreHistogrammes
Modèle de graphe
Histogramme desmesures locales
Histogrammenormalisé
minimum des distances ou un classifieur
gipsa-lab
APPORTS
RESULTATS
DISTANCE ENTRE HISTOGRAMMES
12
• Mesure de (dis)semblance bin à bin :
Battacharyya :
Chi²
Hellinger :
• Mesure de dissemblance avec conservation de la silhouette :
Earth Mover Distance : Optimisation du travail minimum qu'un
cantonnier doit fournir pour transporter un tas de terre en un autre
Match : Comparaison des histogrammes cumulés
gipsa-lab
RESULTATS
DONNEES DE SYNTHESES
13
Performances graphlets : 78% mesures globales : 88% à 97.3% 6 mesures voire plus mesures locales : 86% ou 100% 1 seule mesure
Precision
SW 100%
RPL 100%
RkR 100%
PA 100%
KG 100%
FF 100%
ER 100%
100%
Precision
SW 100%
RTG 96%
RPL 98%
PA 99%
KG 96%
FF 98%
ER 93%97.2%
Résultat de la classification
histogrammes mesures globales
gipsa-lab
RESULTATS
GRAPHES DE CONNECTIVITES
14
GLOBALES
A.N.N.
C PC 11 9 55%P 5 12 71%
69% 57% 63%
mesures globales 63% V.S. 83% MAX histogrammes
Matrice de confusion de l’identification Control / Patient
HISTOGRAMME
CLUSTERINGET
CHI²
C PC 18 2 90%P 4 13 76%
82% 87% 83%
gipsa-lab
RESULTATS
MODELISATION DES GCC
15
7 Clustering DegrésER 0,418 0,133FF 0,207 0,074KG 0,112 0,211RPL 0,156 0,088PA 0,437 0,242
RkR 0,459 0,183SW 0,103 0,238
Distance EMD entre GCC et modèles pour deux mesures locales
gipsa-lab
RESULTATS
CLASSE MANQUANTE
16
Erdos-Renyi
FFRPL
Forest Fire
RPLSW
Kronecker Graph
FF
77%SW 23%RPL
Preferential Attachment
FFRPL
Random k-Regular
FFRPL
Random Power Law
FF
92% SW 8% PA
Small-World
FFRPL
Graphes de Connectivités
FFRPLPASW…
gipsa-lab
RESULTATS
ROBUSTESSE N ET D
17
100 200 300 400 500 600 700 800 900 100011001200130014001500160017001800190020000,01 11% 10% 10% 14% 14% 12% 7% 6% 14% 11% 23% 29% 29% 30% 29% 34% 34% 36% 36% 45%0,02 12% 18% 18% 16% 20% 22% 30% 39% 41% 42% 43% 42% 42% 44% 42% 43% 43% 42% 42% 43%0,03 10% 19% 20% 27% 28% 41% 41% 45% 43% 43% 43% 42% 41% 40% 44% 43% 44% 43% 43% 43%0,04 17% 26% 32% 40% 43% 41% 44% 40% 43% 43% 43% 43% 43% 43% 45% 43% 43% 43% 43% 42%0,05 16% 25% 41% 42% 41% 43% 42% 43% 38% 40% 43% 42% 42% 43% 42% 43% 42% 43% 43% 43%0,06 33% 41% 43% 44% 43% 42% 42% 46% 41% 43% 43% 43% 42% 43% 43% 43% 49% 43% 44% 43%0,07 36% 57% 54% 65% 62% 70% 67% 72% 71% 72% 69% 71% 68% 72% 85% 85% 83% 86% 84% 86%0,08 44% 69% 72% 72% 72% 75% 69% 86% 84% 86% 86% 86% 86% 86% 86% 86% 84% 86% 86% 86%0,09 41% 81% 85% 93% 96% 93% 90% 97% 94% 90% 86% 86% 84% 85% 71% 71% 70% 72% 71% 71%0,1 49% 88% 86% 100% 96% 100% 99% 84% 81% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86%
0,11 52% 99% 93% 90% 89% 91% 92% 78% 74% 72% 71% 71% 69% 71% 71% 71% 71% 71% 71% 71%0,12 62% 83% 85% 72% 71% 68% 72% 68% 74% 73% 72% 71% 71% 71% 72% 71% 71% 72% 71% 71%0,13 62% 64% 70% 64% 68% 68% 71% 68% 67% 67% 70% 66% 69% 57% 65% 61% 56% 43% 48% 43%0,14 59% 57% 48% 43% 43% 44% 44% 44% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43%0,15 54% 49% 45% 49% 42% 42% 45% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43%0,16 45% 44% 43% 43% 44% 45% 42% 44% 43% 43% 43% 43% 43% 43% 43% 43% 43% 42% 43% 43%0,17 42% 41% 41% 40% 42% 42% 42% 42% 43% 44% 43% 43% 42% 41% 41% 42% 42% 41% 39% 39%0,18 45% 43% 44% 43% 43% 45% 42% 42% 43% 42% 43% 41% 41% 37% 40% 37% 34% 32% 31% 31%0,19 44% 45% 43% 39% 43% 42% 41% 42% 42% 40% 36% 33% 30% 32% 29% 29% 29% 29% 29% 29%0,2 43% 43% 41% 45% 41% 41% 40% 35% 30% 35% 31% 29% 29% 29% 29% 29% 29% 29% 29% 29%
nombre de nœuds croissant
den
sité
cro
issa
nte
gipsa-lab
RESULTATS
ROBUSTESSE APPRENTISSAGE
18
CROSS-VALIDATION en d100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 200067% 70% 67% 69% 70% 71% 73% 75% 74% 77% 77% 78% 77% 78% 79% 80% 80% 78% 79% 81%
CROSS-VALIDATION en nd = 0,01 0,02 0,03 0,04 0,05 0,06 0,07 0,08 0,09 0,1 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,2
PREC 76% 82% 95% 97% 97% 97% 99% 99% 99% 99% 97% 99% 98% 99% 99% 99% 99% 99% 99% 99%MIN
PREC. 32% 31% 80% 88% 89% 91% 96% 96% 96% 96% 91% 93% 92% 94% 97% 96% 96% 98% 98% 98%MIN
CLASS. ER ER FF FF KG KG KG SW SW SW KG SW SW SW SW SW SW SW SW SW
densité croissante
nombre de nœuds croissant
gipsa-lab
RESULTATS
RANDOMISATION
19
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90%ER 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%FF 100% 100% 97% 97% 100% 97% 100% 100% 100% 100% 100% 97% 100% 100% 100% 100% 100% 100% 100%KG 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%PA 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
RkR 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%RPL 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%SW 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
randomisation croissante
gipsa-lab
RESULTATS
PCA : RESULTATS
20
PC 1 0.415 0.750 0.750PC 2 0.170 0.126 0.876PC 3 0.132 0.076 0.952PC 4 0.101 0.044 0.996PC 5 0.028 0.004 0.999PC 6 0.011 0.000 1.000PC 7 0.003 0.000 1.000
NBR DE COMPOSANTES PRINCIPALES
VARIANCE CUMULEE
STRD DEV %VAR ΣVAR
gipsa-lab
CROSS-VALIDATION d : 5 à 16% d’augmentation moyenne grimpe de 75 à 84%RESULTATS
PCA : ROBUSTESSE
21
14% 14% 14% 2% 0% 20% 11% 5% 14% 15% 14% 15% 15% 15% 15% 17% 17% 20% 23%
1% 14% 25% 15% 20% 28% 25% 24% 19% 36% 36% 37% 37% 40% 40% 55% 43% 42% 43%
6% 26% 31% 30% 35% 43% 46% 63% 62% 64% 63% 58% 60% 61% 67% 61% 66% 63% 61%
27% 34% 42% 45% 53% 55% 60% 66% 67% 67% 66% 68% 66% 64% 63% 51% 55% 57% 55%
31% 43% 48% 57% 59% 60% 63% 70% 66% 69% 71% 70% 70% 71% 71% 58% 70% 78% 70%
32% 51% 56% 69% 70% 66% 68% 72% 71% 74% 72% 86% 86% 85% 84% 71% 85% 86% 83%
34% 62% 68% 71% 70% 71% 83% 86% 87% 86% 86% 85% 84% 85% 86% 79% 84% 86% 86%
36% 67% 67% 79% 85% 86% 86% 86% 84% 86% 86% 86% 86% 86% 86% 86% 85% 86% 86%
40% 76% 94% 99% 100% 100% 98% 99% 99% 100% 100% 100% 99% 100% 94% 96% 96% 83% 82%
46% 96% 99% 100% 98% 100% 99% 98% 98% 98% 100% 100% 100% 100% 88% 88% 87% 88% 86%
52% 100% 96% 100% 99% 100% 100% 95% 94% 92% 93% 96% 92% 91% 86% 86% 86% 86% 86%
57% 98% 100% 99% 100% 98% 87% 73% 74% 77% 75% 72% 72% 73% 72% 71% 71% 72% 71%
58% 80% 85% 68% 73% 67% 70% 57% 55% 59% 57% 57% 57% 58% 58% 57% 57% 57% 57%
61% 64% 69% 64% 66% 61% 63% 59% 58% 58% 57% 57% 58% 57% 57% 57% 57% 58% 57%
65% 57% 67% 62% 59% 60% 58% 56% 58% 57% 57% 58% 57% 57% 58% 58% 58% 58% 57%
68% 59% 61% 53% 56% 57% 57% 58% 58% 57% 57% 57% 57% 57% 57% 57% 58% 57% 57%
66% 56% 56% 42% 45% 52% 57% 57% 57% 58% 57% 58% 57% 57% 57% 57% 57% 57% 57%
62% 57% 61% 43% 43% 47% 54% 58% 58% 55% 57% 57% 57% 57% 58% 58% 58% 57% 57%
60% 58% 57% 43% 43% 43% 44% 49% 54% 47% 46% 56% 57% 57% 57% 57% 57% 56% 57%
57% 59% 52% 46% 43% 42% 43% 42% 41% 43% 44% 43% 43% 43% 43% 43% 43% 43% 43%
11% 10% 10% 14% 14% 12% 7% 6% 14% 11% 23% 29% 29% 30% 29% 34% 34% 36% 36%
12% 18% 18% 16% 20% 22% 30% 39% 41% 42% 43% 42% 42% 44% 42% 43% 43% 42% 42%
10% 19% 20% 27% 28% 41% 41% 45% 43% 43% 43% 42% 41% 40% 44% 43% 44% 43% 43%
17% 26% 32% 40% 43% 41% 44% 40% 43% 43% 43% 43% 43% 43% 45% 43% 43% 43% 43%
16% 25% 41% 42% 41% 43% 42% 43% 38% 40% 43% 42% 42% 43% 42% 43% 42% 43% 43%
33% 41% 43% 44% 43% 42% 42% 46% 41% 43% 43% 43% 42% 43% 43% 43% 49% 43% 44%
36% 57% 54% 65% 62% 70% 67% 72% 71% 72% 69% 71% 68% 72% 85% 85% 83% 86% 84%
44% 69% 72% 72% 72% 75% 69% 86% 84% 86% 86% 86% 86% 86% 86% 86% 84% 86% 86%
41% 81% 85% 93% 96% 93% 90% 97% 94% 90% 86% 86% 84% 85% 71% 71% 70% 72% 71%
49% 88% 86% 100% 96% 100% 99% 84% 81% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86%
52% 99% 93% 90% 89% 91% 92% 78% 74% 72% 71% 71% 69% 71% 71% 71% 71% 71% 71%
62% 83% 85% 72% 71% 68% 72% 68% 74% 73% 72% 71% 71% 71% 72% 71% 71% 72% 71%
62% 64% 70% 64% 68% 68% 71% 68% 67% 67% 70% 66% 69% 57% 65% 61% 56% 43% 48%
59% 57% 48% 43% 43% 44% 44% 44% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43%
54% 49% 45% 49% 42% 42% 45% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43%
45% 44% 43% 43% 44% 45% 42% 44% 43% 43% 43% 43% 43% 43% 43% 43% 43% 42% 43%
42% 41% 41% 40% 42% 42% 42% 42% 43% 44% 43% 43% 42% 41% 41% 42% 42% 41% 39%
45% 43% 44% 43% 43% 45% 42% 42% 43% 42% 43% 41% 41% 37% 40% 37% 34% 32% 31%
44% 45% 43% 39% 43% 42% 41% 42% 42% 40% 36% 33% 30% 32% 29% 29% 29% 29% 29%
43% 43% 41% 45% 41% 41% 40% 35% 30% 35% 31% 29% 29% 29% 29% 29% 29% 29% 29%
CROSS-VALIDATION n : Jusqu’à 5% d’augmentation moyenne grimpe de 96 à 97%
AVANT APRES
gipsa-lab
RESULTATS
PCA : INTERPRETATION
22
CO
MP
OS
AN
TE
2
COMPOSANTE 1
Biplot: représentation visuelle
RANDOM POWER LAW
COMPONENT 1
SMALL WORLD
FOREST FIRE
PREF ATTACHMENT
ERDOS RENYI
K REGULAR
VECTEURS DES ANCIENNES VARIABLES
gipsa-lab
CONCLUSION
De bonnes performances sur les graphes de synthèse
Les histogrammes locaux sont importants
Le clustering local est particulièrement intéressant
Dépendant du nombre et du choix des modèles
Les résultats sur les données réelles sontà approfondir
Une combinaison des modèles est à envisager