towards distributed information retrieval in the semantic web: query reformulation using the...
TRANSCRIPT
![Page 1: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/1.jpg)
Towards Distributed Information Retrieval in
the Semantic Web:Query Reformulation Usingthe Framework
Wednesday 14th of June, 2006
Raphaël Troncy, Umberto Straccia
![Page 2: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/2.jpg)
Motivation
Various SW repositories, using different vocabularies, distributed on the web
Already large amounts of data out thereSwoogle hits 1.5M unique Semantic Web
documents (05/06/2006)Problem:How to search and retrieve information in
such an environment?
![Page 3: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/3.jpg)
Example scenario
Kim Clijsters (courtesy of AFP)
Montenegro independence (courtesy of Euronews)
NewsML
SportsML
EventsML
iCalendar
TimeML
<newsItem schema="0.7" version="2"> <itemMeta> <contentClass code="ccls:photo" /> </itemMeta> <contentMeta> <infoSource literal="AFP" /> <locCreated code="city:Paris"/> </contentMeta> ...</newsItem>
EXIF
EBU P/Meta
MXF
MPEG-7
![Page 4: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/4.jpg)
Distributed Search in the SW:Resource selectionSelect a subset of some relevant
resourcesQuery reformulationReformulate the information need into
the vocabulary used by the resource Data fusion and rank aggregationMerged and ranked all the results
together
![Page 5: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/5.jpg)
Resource Selection
Compute an approximation of the content of each resources
For some random queries, an approximation consists of:The ontology the resource relies onSome instances (sampling annotated
documents)
![Page 6: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/6.jpg)
Query reformulation
Transformation rules:From the query vocabulary to the vocabularies
used by the resources Semantic Web: Ontology Alignment
Establishing relationships holding between the entities (subsumption, equivalence, disjointness…)
With a confidence measure Automatically computed
1..0
an ontologyalignment framework
![Page 7: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/7.jpg)
oMAP: Ontology Alignment Tool
TerminologicalClassifiers
Machine Learning-based Classifiers
Structural andSemantics-based
Classifiers
- Formal and open framework- Classifiers customization (parameter, chaining)
subsumption
equivalent
![Page 8: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/8.jpg)
oMAP: A Formal Framework
Sources of inspiration:Formal work in data exchange [Fagin et al.,
2003]GLUE: combining several specialized
components for finding the best set of mappings [Doan et al., 2003]
Notation:A mapping is a triple: M = (T, S, ∑)S and T are the source and target ontologiesSi is an OWL entity (class, datatype property,
object property) of the ontology∑ is a set of mapping rules: αij Tj ← Si
![Page 9: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/9.jpg)
oMAP: Combining Classifiers
Weight of a mapping rule:αij = w (Si,Tj, ∑)
Using different classifiers:w (Si,Tj,CLk) is the classifier's
approximation of the rule Tj ← Si
Combining the approximations:Use of a priority list: CL1 CL2 … CLn Weighted average of the classifiers
prediction
![Page 10: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/10.jpg)
Terminological Classifiers
Same entity names (or URI)
Same entity name stems
otherwise 0
name, same have , if 1),,(
ji
Nji
TSCLTSw
otherwise 0
stem, same have , if 1),,(
ji
Sji
TSCLTSw
![Page 11: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/11.jpg)
Terminological Classifiers
String distance name
Iterative substring matching
See [Stoilos et al., ISWC'05]
))(),(max(
),(),,(
ji
jinLevenshteiLDji TlengthSlength
TSdistCLTSw
),(),(),(),,( jijijiISji TSwinklerTSDiffTSCommCLTSw
![Page 12: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/12.jpg)
Terminological Classifiers
WordNet distance name
lcs is the longest common substring between Si
and Tj
sim =
otherwise
)()(
*2,max
synonyms, are , if 1
),,(
ji
ji
WNji
TlengthSlength
lcssim
TS
CLTSw
)()(
)()(
ji
ji
TsynonymSsynonym
TsynonymSsynonym
![Page 13: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/13.jpg)
Machine Learning-Based Classifiers
Collecting bag of words:label for the named individualsdata value for the datatype propertiestype for the anonymous individuals and the
range of object properties…
Recursion on the OWL definition:depth parameter
Use statistical methods on the collected bag of words
![Page 14: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/14.jpg)
Machine Learning-Based Classifiers
ExampleIndividual (x1 type (Conference)
value (label "European SW Conf") value (location x2) )
Individual (x2 type (Address)
value (city "Budva") value (country "Montenegro") )
u1 = (" European SW Conf ", "Address")u2 = ("Address", "Budva", "Montenegro")
Naïve Bayes text classifier
kNN text classifier
jTux um
iiNBji SmSCLTSw),(
)Pr()Pr(),,(
![Page 15: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/15.jpg)
Structural and Semantics-Based Classifier
∑ is a set of mapping rules: αij Tj ← Si
∑ sets are computed by taking the OWL definition of the entities to alignrecursively in the OWL structure... without looping thanks to cycles
detection
![Page 16: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/16.jpg)
Structural and Semantics-Based Classifier If Si and Tj are property names:
If Si and Tj are concept names1:
otherwise ),,('
if 0),,(
ji
ij
ji TSw
STTSw
otherwise ),,(max),,('1)Set(
1
and 0D if ),,('
if 0
),,(
),(
t
setDjCijisetji
ijji
ij
ji
DCwTSw
STTSw
ST
TSw
1 Where D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si
![Page 17: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/17.jpg)
Structural and Semantics-Based ClassifierLet CS=(QR.C) and DT=(Q’R’.D), then1:
Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2:
),,(),',()',(),,( DCwRRwQQwDCw QTS
),min(
),,(max
)',(),,(),(
nm
DCw
opopwDCwsetDjCi
jiset
opTS
1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions
2 Where op, op’ are concept constructors and n,m ≥ 1
![Page 18: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/18.jpg)
Evaluation
OAEI Contests (2004, 2005, 2006): http://oaei.ontologymatching.org/Systematic benchmark tests on
bibliographic dataTests 2xx: aligning an ontology with
variations of itself where each OWL constructs are discarded or modified one per oneTests 3xx: four real bibliographic ontologies
Web categories alignmenthttp://oaei.ontologymatching.org/2005/
results/
![Page 19: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/19.jpg)
Benchmark Tests
dublin20 0.92
Falcon 0.91
FOAM 0.90
oMAP 0.85
CMS 0.81
OLA 0.80
ctxMatch 0.72
edna 0.45
Falcon 0.89
OLA 0.74
dublin20 0.72
FOAM 0.69
oMAP 0.68
edna 0.61
ctxMatch 0.20
CMS 0.18oMAP:4th with the global F-Measure1st on 3xx tests (real ontologies to align)
Precision Recall
![Page 20: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/20.jpg)
Aligning Web Categories
Aligning Google, Loksmart and Yahoo web categories [Avesani et al., ISWC'05]
Blind tests: only recall results are available
ctxMatch
FOAM CMS Dublin20
Falcon OLA oMAP
9.4% 11.9% 14.1% 26.5% 31.2% 32.0% 34.4%
![Page 21: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/21.jpg)
Distributed Search in the SW
Q: retrieve course material dealing with history of the Americas and "Columbus"query(d)<- History_Americas(d,"Columbus")
is re-written as two queries0.63 query(d)<-
Latin_American_History(d,"Columbus")0.84 query(d)<- American_History(d,"Columbus")
Each document score is then multiplied with the confidence score of the rule.
university courses
![Page 22: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/22.jpg)
Conclusion
Distributed Search in the SWresource selection / query reformulation /
data fusion and rank aggregationoMAP: a formal framework for aligning
automatically OWL ontologiesCombining several specific classifiersTerminological classifiersMachine learning-based classifiersStructural and semantics-based classifier
![Page 23: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/23.jpg)
Future Work
Implementing the three steps proposedKeyword-based or structured (SPARQL) queriesRanked list of results
oMAPUsing additional classifiers:KL-distance, other resources, background K, etc.Straightforward theoretically but practically
difficult!Finding complex alignmentname = firstName + lastName
OWL and rule-based languages:Take into account this additional expressivity
![Page 24: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/24.jpg)
http://www.cwi.nl/~troncy/oMAP/
Any questions ?
![Page 25: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/25.jpg)
![Page 26: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006](https://reader036.vdocuments.net/reader036/viewer/2022070401/56649f1e5503460f94c35dcf/html5/thumbnails/26.jpg)
Structural and Semantics-Based ClassifierPossible values for wop and wQ
weights
wop wQ⊓ ⊔ ¬
⊓ 1 1/4 0
⊔ 1 0
¬ 1
1 1/4
1
n n m
1 1/3
m
1