rough set semantics for identity management on the web
DESCRIPTION
Presented at the AAAI Fall Symposium for Big Data on 2013-11-15.TRANSCRIPT
Rough Set Semantics forIdentity management on the
WebWouter Beek
(wouterbeek.com)Stefan Schlobach
Frank van Harmelen
Problems of identity
• Statements only hold in certain contexts (no substitution salva veritate)• Identity is mistaken for representation.• Identity is mistaken for (close) relatedness.
But more importantly:• Semantics: identity assertion (claim about meaning)• Pragmatics: data linking (import additional properties)• Due to: Open World Assumption
owl:differentFrom(Semantics,Pragmatics)
SEMANTICS iff
PRACTICE
“Link your data to other people’s data to provide context.”
[5-star LOD]
“RDF links often have the owl:sameAs predicate.”
[VoID]
Can Leibniz help?
• Indiscernibility of identicals (Leibniz’ principle)
• Identity of indiscernibles
• Trivially true, since is one of the ’s
Solutions (as identified in the literature) [1/2]1) Weaken owl:sameAsE.g. skos:closeMatch
2) Extend owl:sameAsAnnotate with Fuzzyness or uncertainty.
3) Make contexts explicitE.g. use named graphsE.g. use namespaces“That is the star that can be seen in the morning, but not in the evening”@geolocation
Solutions (as identified in the literature) [2/2]4) Use domain-specific identity relations“x and y have the same medical use” @medicine“x and y are the same molecule” @chemistry
5) Change modeling practiceNotification upon read.Require reciprocal confirmation upon change.“On the Web of Data, anybody can say anything about anything.”[Van Harmelen]
Indiscernibility
Identity is the smallest equivalence relation.
Indiscernibility: resources are the same w.r.t. a limited set of predicates.Indiscernibility is an equivalence relation (reasoning!), although not necessarily the smallest one.
Every indiscernibility relation is also an identity relation, but over a different domain:• Example: Take the set of people and property Context induces the identity
relation between income-groups.
Indiscernibility 1Two resources are indiscernible w.r.t. a set of predicates (predicate terms in G), if they share the predicate-object pairs for .where Example: “Wouter and Stefan have the same employer, so they are indiscernible w.r.t. predicate hasEmployer.
Indiscernibility 2
• We take a given identity relation and partition it into subsets (i.e. identity sub-relations) which are described in terms of the vocabulary.• Subsets of the given identity relation are -indiscernible, for sets of
predicates
Example:• “(Wouter and Albert) and (Stefan and Paul) belong to the same
identity sub-relation, since they are indiscernible w.r.t. the same collections of properties.• Wouter and Albert are “employedAs PhD”; Stefan and Paul are
“employedAs Assistant Professor”.
Indiscernibility 2
For comparison:
Example of an indiscernibility partition
Rough set approximation
Higher approximation:
Lower approximation:
But what is (‘resemblance’)?
Example of indiscernibility approximations
Quality
• Based on the rough set approximation .• Since a consistently applied identity relation has relatively many
partition sets that contain either no identity pairs (small value for ) or only identity pairs (large value for ), a more consistent identity relation has a higher quality metric.
Generalizations
• This works for any binary relation (not only owl:sameAs).• We only discussed the identity of non-property resources, but properties
can also be identical.• We skipped the treatment of blank nodes and typed literals (which have
special identity criteria).• The indiscernibility ‘language’ can be made must stronger, allowing more
fine-grained identity sub-relations:• Length-1 paths, e.g. “Wouter lives in the Netherlands.”• Length-2 paths, e.g. “Wouter lives in a country which borders Germany.”• Length- paths.• Intervals in the value space of typed literals, e.g. “was published between 1901 and
1905”• Natural language translation, e.g. “lives in Germany” and “lives in Deutschland”
Depth- Predicate Path Map (PPM)
A sequence of predicates denoting a (functional) mapping from subject terms into sets of object terms:
Indiscernibility 1 (generalized)Two resources are indiscernible w.r.t a set of PPMs , if they share the properties denoted by .
Example: “Wouter and Stefan have the same employer, so they are indiscernible w.r.t. has-employer.Details:•
Indiscernibility 2 (generalized)
We take a given set of pairs (e.g. an identity relation) and partition it into subsets which are described in terms of the schema.Subsets of the given (identity) relation are -indiscernible, for sets of PPNs
Indiscernibility 2 (generalized)
For comparison:
Conclusion
Problem:• There is a conflict between semantics and pragmatics of identity.• This will not be fixed in the short term by using extensions to existing
logics (e.g. contexts, fuzziness, probability).Solution:• Identify different identity relations automatically, and in terms of the
domain predicates (no extra constructs are needed!).• Define the meaning of a specific identity relation in terms of its
indiscernibility criteria.