survey on common strategies of vocabulary reuse in linked open data modeling @eswc2014
Post on 13-Aug-2015
280 Views
Preview:
TRANSCRIPT
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling
Johann SchaibleGESIS Leibniz-Institute for the
Social Sciences, Cologne, Germanyjohann.schaible@gesis.org
Thomas GottronInstitute for Web Science and
Technologies, University of Koblenz-Landau, Germany
gottron@uni-koblenz.de
Ansgar ScherpKiel University and Leibniz
Information Center for Economics, Kiel, Germany
mail@ansgarscherp.net
1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible
• How to…– …choose which vocabulary to reuse?– …find an appropriate mix of vocabularies?
• In order to achieve aspects, such as – providing a clear data structure– making data easier to be consumed– Achieving ontological agreement
Leads to different reuse strategies
Based on experience and “gut-feeling”
Motivation…
…and Contribution
Condense and aggregate expert’s knowledge and experience (“gut-feeling”)
1. Which aspects for reusing vocabularies are most important
2. Which vocabulary reuse strategy to followin a real-world scenario
Survey Design
Ranking Task T1
Ranking Task T2
Ranking Task T3
Aspects for reusing vocabularies
Reasons for ranking decision
Reasons for ranking decision
Reuse vs. Interlink Appropriate Mix of vocabularies
Additional Meta-Information
• Perspective of a LOD modeler• “Suppose, you have to model data as LOD…“
Ranking Tasks Structure
Assignment:
• Model data from a specific
domain as LOD
• Need to reuse vocabularies
• “Which of the provided
options do you consider the
better vocabulary reuse
strategy”
Ranking Tasks Example
Strategy minV:Reuse a minimum amount of vocabularies
Strategy pop:Reuse mainly popular vocabularies
Features for PopularityNumber of datasets using vocabulary V
Total occurrence of vocabulary term vi
Strategy:minV
Strategy:pop
Ranking Task T1
Reuse vs. Interlink
• Domain: Movies and actors
• Vocabulary reuse strategies:
1. pop: Reuse popular vocabularies
2. link: Define own vocabulary and link it to existing
popular vocabulary ()
3. max: Reuse a maximum amount of vocabularies
(lower boundary)
• Number of possible models to choose from: 3
Ranking Task T2
Find appropriate mix of different vocabularies
• Domain: Publications and authors
• Vocabulary reuse strategies:
1. minV: Reuse a minimum amount of vocabularies
2. max: Reuse a maximum amount of vocabularies (lower
boundary)
3. pop: Reuse popular vocabularies
4. minC: Reuse a minimum amount of vocabularies per
concept
• Number of possible models to choose from: 4
Ranking Task T3
Vocabulary reuse given additional
meta-information
• Domain: Music and musical artists
• Vocabulary reuse strategies:
1. minD: Reuse only domain specific vocabularies
2. minV: Reuse a minimum amount of vocabularies
3. pop: Reuse popular vocabularies
• Number of possible model to choose from: 3
Results of Ranking Tasks
Key insights• Reusing over interlinking• Popular vocabularies over minimizing number of vocabularies• Additional meta-information has effect on choice
11
Meta-Information Useful?
Key insights• No definite favorite support• # of datasets a vocabulary over total term occurrence• Most common use by others information: not valuable 12
Aspects for vocabulary reuse
Clear Data
Stru
cture
Data easi
er to be co
nsumed
Ontologic
al Agg
reement0
1
2
3
4
5
Before Ranking TasksAfter first ranking taskAfter second ranking task
Ratin
gs o
n a
5-po
int L
iker
t-sc
ale
13
• Linked Data experts and practitioners
• Acquired through LOD and Semantic Web mailing lists
• N = 79 (16 female, 63 male) (n.s. difference in answers)
• 67% academia, 23% industry, 10% both
• Research associates (22), postdocs (14), professors (8),
engineers and other professions (27).
• Age: M = 34.6, SD = 8.6
• Experience in LOD ( in years): M = 4, SD = 2.64
• Expertise in consuming and publishing LOD:
M = 3.64, S = 1 (on a 5-point-Likert Scale)
(n.s. difference in answers of group > 4 and group < 4)
Participants
• Which aspect are more important?
– All aspects are „somewhat important“ (Mdn = 4 )
– Aspects are rated higher in theory than in real-life
• Which strategy to follow?
– Preferred choice: reuse popular vocabularies
Better than minimizing number of vocabularies
– Popular vs. domain specific vocabularies: unclear
– Interlinking has not a good uptake
• Which meta-information is most useful?
– # of datasets using a vocabulary
– Most common use has no good uptake
Conclusion
15
1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible
Questions?
Thank you very much for participating in the survey and helping me with my research
top related