link discovery tutorial part ii: accuracy
TRANSCRIPT
Link Discovery TutorialPart II: Accuracy
Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany(2) FORTH, Greece
October 18th, 2016Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 2 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 3 / 54
IntroductionLink Discovery as Classification Task
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
IntroductionLink Discovery as Classification Task
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
IntroductionLink Discovery as Classification Task
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
IntroductionChallenge
Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary
Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
IntroductionChallenge
Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary
Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 6 / 54
RAVENApproach
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn
AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVENApproach
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn
AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVENApproach
Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ
Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn
AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVENApproach
Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
RAVENApproach
Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
RAVENApproach
Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 9 / 54
RAVENApproach
Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 10 / 54
RAVENApproach
Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 11 / 54
RAVENApproach
Class RestrictionsSimilarity function
String similarityNumber of shared property values amongst instances. . .
Solve corresponding hospital-resident problem
Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases
DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer
Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
RAVENApproach
Class RestrictionsSimilarity function
String similarityNumber of shared property values amongst instances. . .
Solve corresponding hospital-resident problem
Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases
DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer
Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
RAVENApproach
Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples
Guess initial classifier
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
RAVENApproach
Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples
Guess initial classifier
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
RAVENApproach
Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples
Pick most informative examples, i.e., unclassified and closest to boundary
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 14 / 54
RAVENApproach
Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples
Ask for classification from oracle
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 15 / 54
RAVENApproach
Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples
Update classifier
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 16 / 54
RAVENEvaluation
Evaluation on Diseases (Diseasome to DBpedia)Learning rate = 0.0210 questions/iterationF-measure of up to 92%
1 3 5 7 9 11 13 15 17 19 21 23 25
Number of iterations
0
10
20
30
40
50
60
70
80
90
100
P (%)R (%)F (%)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 17 / 54
RAVENEvaluation
Learning rate = 0.0210 questions/iteration
1 3 5 7 9 11 13 15 17 19 21 23 25
Number of iterations
10
100
1000
Run
time
(ms)
DiseasesDrugsSide Effects
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 18 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 19 / 54
EagleEfficient Active Learning of Link Specifications using Genetic Programming
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 20 / 54
EagleEfficient Active Learning of Link Specifications using Genetic Programming
EagleProvides means for automatic class and property matchingMinimizes human labeling effort through active learningAllow for learning generic specs (limitation of RAVEN)Similar approaches [NIK+12; ISE+12]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 21 / 54
EagleFormal Definition
Same formal setting as RAVENTwo sets of restrictions resp. that specify the sets S resp. T ,a specification of mapping properties (p1, q1), . . . , (pn, qn) for the elements ofS and T anda specification of a complex similarity measure σ as the combination of severalatomic similarity measures σ1, . . . , σn and of a set of thresholds θ1, . . . , θn suchthat θi is the threshold for σi .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 22 / 54
EagleLS example
Can learn generic classifier type
f (levenshtein(:title, :title), 0.53)
f (cosine(:venue, :year), 1.00)\
f (jaccard(:title, :authors), 0.43)
f (trigrams(:title, :year), 1.00)
ut
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 23 / 54
EagleIdea & Goal
EagleIdea: Specifications are treesGoal: Learn elements of trees through genetic operations until best LS isfound
u
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 24 / 54
Eagle AlgorithmStep 1: Generate initial population
Random process (property pairs, thresholds)Compute fitnessFitness = F-Measure w.r.t known data
(m1, θ1)
p1 q1
(m2, θ2)
p2 q2
(m3, θ3)
p3 q3
u
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 25 / 54
Eagle AlgorithmStep 2: Evolve population
Tournament between two individualsTwo operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
u
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle AlgorithmStep 2: Evolve population
Tournament between two individualsTwo operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
u
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle AlgorithmStep 2: Evolve population
Tournament between two individualsTwo operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
u
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle AlgorithmStep 2: Evolve population
Tournament between two individualsTwo operators: Mutation and crossover
p1 q1
(m1, θ1 + α) (m3, θ3)
p2 q2
(m2, θ2)
p3 q3
u
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle AlgorithmStep 2: Evolve population
Tournament between two individualsTwo operators: Mutation and crossover
p1 q1
(m1, θ1 + α) (m3, θ3)
p2 q2
(m2, θ2)
p3 q3
u
(m4, θ4)
p3 q3
(m2, θ2)
p3 q3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle AlgorithmStep 3: Computation of most informative links
Previous approaches define amount of information of link as closeness to thedecision boundaryHere, use disagreement amongst elements of population of size n
δ((s, t)) = (n − |Mti : (s, t) ∈Mi)|)(n − |Mt
i : (s, t) /∈Mi |)
Function is maximal when n2 count (s, t) as positive and n
2 as negativeCan be modeled with other functions such as entropy
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 27 / 54
Eagle AlgorithmStep 4: Active Learning
Compute d((s, t)) for all (s, t) returned by a LSPick k most informativeRequire labeling from userUpdate list of positive and negative examples
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
u
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 28 / 54
Eagle AlgorithmStep 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
u
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
If termination conditions not met, goto Step 2Else terminate and pick fittest LS
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle AlgorithmStep 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
u
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
If termination conditions not met, goto Step 2Else terminate and pick fittest LS
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle AlgorithmStep 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
u
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
If termination conditions not met, goto Step 2Else terminate and pick fittest LS
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle AlgorithmUnsupervised learning
Measure degree of monogamy of links [NIK+12]Only works for 1-1 relations, e.g., owl:sameAs
P(M) = |s|∃t : (s, t) ∈ M|∑s|t : (s, t) ∈ M| ,R(M) = |t|∃s : (s, t) ∈ M|∑
t|s : (s, t) ∈ M| . (1)
Fβ(M) = (1 + β2) Pd(M)Rd(M)β2Pd(M) +Rd(M) (2)
s1
s2
s3
t1
t2
t3
t4
linklinklinklink
P = 3/4, R = 2/4, F = 3/5.Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 30 / 54
EagleExperiments and Results
Experimental Setup:Compared batch learning and genetic programmingUsed 3 different data sets
1 Dailymed-Drugbank (LATC)2 DBpedia-LinkedMDB (LATC)3 DBLP-ACM
Compared different sizes of population (20,100)Compared random annotation with active learningMutation and crossover rates = 0.6Maximal number of iterations = 50
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 31 / 54
EagleExperiments and Results
Dailymed-Drugbank
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 32 / 54
EagleExperiments and Results
DBpedia-LinkedMDB
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 33 / 54
EagleExperiments and Results
DBLP-ACM
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 34 / 54
EagleExperiments and Results
Larger population leads toBetter results, yetLonger runtimes
For most datasets, population size of 100 seems sufficient for most linkeddata setsEAGLE is more time-efficient than state of the art
337s for ACM-DBLP (n=100) vs.1553s for Marlin (ADTree)2196s for Marlin (SVM)4320s for Febrl (SVM)
Active learning clearly outperforms random annotation
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 35 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 36 / 54
CoalaCorrelation-Aware Active Learning of Link Specifications
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 37 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
CoalaLearning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
CoalaLearning Complex Specifications
InsightChoice of right example is key for learningSo far, only use of informativeness
QuestionCan we do better by using more information?Higher F-measureOften slower
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
CoalaLearning Complex Specifications
InsightChoice of right example is key for learningSo far, only use of informativeness
QuestionCan we do better by using more information?
Higher F-measureOften slower
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
CoalaLearning Complex Specifications
InsightChoice of right example is key for learningSo far, only use of informativeness
QuestionCan we do better by using more information?Higher F-measureOften slower
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
Coala ApproachBasic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
Coala ApproachBasic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
Coala ApproachBasic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
CoalaSimilarity of Candidates
Link candidate x = (s, t) can be regarded as vector(σ1(x), . . . , σn(x)) ∈ [0, 1]n.Similarity of link candidates x and y :
sim(x , y) = 1
1 +√
n∑i=1
(σi(x)− σi(y))2
. (3)
Allows exploiting both intra- and inter-class similarity
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 42 / 54
CoalaGraph Clustering
Rationale: Use intra-class similarityApproach
Cluster elements of S+ and S− independentlyChoose one element per cluster as representativePresent oracle with most informative representatives
0.8
0.9
0.8
S+
S-
0.8
0.9
0.8
0.25
0.25
0.9
0.80.8
0.8
0.25a
b
c
d
e
d
f g
hi
k
l
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 43 / 54
CoalaBorderFlow
G = (V ,E , ω) with V = S+ or V = S−
ω(x , y) = sim(x , y)Keep best ec edges for each x ∈ V
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 44 / 54
CoalaBorderFlow
Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
CoalaBorderFlow
Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
CoalaBorderFlow
Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
CoalaBorderFlow
Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
CoalaBorderFlow
Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
CoalaSpreading Activation
Rationale: Use both inter- and intra-class similarityApproach
M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2
A0 : ai = ifm(xi )
At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©
t−1 (weight decay)
3 iterations 3.9*10-3
0.97
0.691
0.73 1.5*10-5
3.9*10-3
1.5*10-5
3.9*10-3
S+ S-
0.8
0.80.9
0.90.25
0.5 0.5
0.25
0.5
S+ S-
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
CoalaSpreading Activation
Rationale: Use both inter- and intra-class similarityApproach
M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2
A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©
t−1 (weight decay)
3 iterations 3.9*10-3
0.97
0.691
0.73 1.5*10-5
3.9*10-3
1.5*10-5
3.9*10-3
S+ S-
0.8
0.80.9
0.90.25
0.5 0.5
0.25
0.5
S+ S-
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
CoalaSpreading Activation
Rationale: Use both inter- and intra-class similarityApproach
M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2
A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©
t−1 (weight decay)
3 iterations 3.9*10-3
0.97
0.691
0.73 1.5*10-5
3.9*10-3
1.5*10-5
3.9*10-3
S+ S-
0.8
0.80.9
0.90.25
0.5 0.5
0.25
0.5
S+ S-
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
Coala EvaluationExperimental Setup
Used EAGLE as active learning approachMutation and crossover rate = 0.6Selection rate = 0.7Not deterministic ⇒ Ran each experiment 5 times5 queries to oracle per iteration10 iterations overall2 populations sizes: 20 and 10050 generations between iterations
Two real-world and three synthetic datasetsSingle thread of a server (JDK1.7, Ubuntu 10.0.4, AMD Opteron 2GHz,2GB/Experiment)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 48 / 54
Coala EvaluationParameters for WD
Ran experiments on DBLP-ACMPopulation = 20r ∈ 2, 4, 8, 16, 32
10 20 30 40 50 60 70 80 90 1000.5
0.6
0.7
0.8
0.9
1
F-sc
ore
0
200
400
600
800
1,000
runti
me in s
eco
nds
f(2) f(4) f(8) f(16) f(32)d(2) d(4) d(8) d(16) d(32)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 49 / 54
Coala EvaluationParameters for CL
Ran experiments on DBLP-ACMPopulation = 20ec ∈ 1, 2, 3, 4, 5
10 20 30 40 50 60 70 80 90 1000.5
0.6
0.7
0.8
0.9
1
F-sc
ore
0
500
1,000
1,500
2,000
runti
me in s
eco
nd
s
f(1) f(2) f(3) f(4) f(5)d(1) d(2) d(3) d(4) d(5)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 50 / 54
Coala EvaluationF-Scores
Population = 100, final valuesBetter results, yet unclear when to use WD or CL
DataSet EAGLE WD CLAbt 0.19±0.04 0.25±0.04 0.23±0.04DBLP 0.91±0.03 0.96±0.01 0.96±0.02Person1 0.86±0.02 0.89±0.01 0.81±0.18Person2 0.74±0.03 0.71±0.08 0.77±0.03Restaurant 0.89±0.0 0.86±0.02 0.89±0.0
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 51 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 52 / 54
Summary and Conclusion
Large number of challenges tolearning accurate specifications
1 Reduce labeling effort⇒ Active learning
2 Learn complex specifications⇒ Genetic programming
3 Learn specifications efficienty⇒ See previous slides
Challenges include1 Determinism2 Deep learning3 Self-checking4 . . .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 53 / 54