dr. bhavani thuraisingham the university of texas at dallas trustworthy semantic webs october 2013...
TRANSCRIPT
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Trustworthy Semantic Webs
October 2013
Data and Applications Security
Outline
Semantic web XML and XML security RDF and RDF security Ontologies Rules Applications Reference:
- Building trustworthy semantic web, Thuraisingham, CRC Press, 2007
Layered Approach: Tim Berners Lee’s Visionwww.w3c.org
What is XML all about?
XML is needed due to the limitations of HTML and complexities of SGML
It is an extensible markup language specified by the W3C (World Wide Web Consortium)
Designed to make the interchange of structured documents over the Internet easier
Key to XML used to be Document Type Definitions (DTDs)
- Defines the role of each element of text in a formal model XML schemas have now become critical to specify the
structure
- XML schemas are also XML documents
XML Elements
XML StatementJohn Smith is a Professor in Texas
This can be expressed as follows:
<Professor><name> John Smith </name><state> Texas </state>
</Professor>
XML Elements
Now suppose this data can be read by anyone then we can augment the XML statement by an additional element called access as follows.
<Professor><name> John Smith </name><state> Texas </state><access> All, Read </access>
</Professor>
XML Elements
If only HR can update this XML statement, then we have the following:
<Professor><name> John Smith </name><state> Texas </state><access> HR department, Write </access>
</Professor>
XML Elements
We may not wish for everyone to know that John Smith is a professor, but we can give out the information that this professor is in Texas.
This can be expressed as:
<Professor><name> John Smith, Govt-official, Read </name><state> Texas, All, Read </state><access> HR department, Write </access>
</Professor>
XML Attributes
Suppose we want to specify to access based on attribute values. One way to specify such access is given below.
<ProfessorName = “John Smith”, Access = All, ReadSalary = “60K”, Access = Administrator, Read, WriteDepartment = “Security” Access = All, Read
</Professor
Here we assume that everyone can read the name John Smith and Department Security.
But only the administrator can read and write the salary attribute.
XML DTD
DTDs essentially specify the structure of XML documents.
Consider the following DTD for Professor with elements Name and State.
This will be specified as:
<!ELEMENT Professor Officer (Name, State)><!ELEMENT name (#PCDATA)><!ELEMENR state (#PCDATA)><!ELEMENT access (#PCDATA).>
XML Schema
While DTDs were the early attempts to specify structure for XML documents, XML schemas are far more elegant to specify structures.
Unlike DTDs XML schemas essentially use the XML syntax for specification.
Consider the following example:
<ComplexType = name = “ProfessorType”><Sequence><element name = “name” type = “string”/><element name = “state” type = “string”/><element name = “access” type = “strong/><Sequence>
</ComplexType>
XML NamespacesNamespaces are used for DISAMBIGUATION
<CountryX: Academic-Institution
Xmlns: CountryX = http://www.CountryX.edu/Instution DTD”
Xmlns: USA = “http://www.USA.edu/Instution DTD”Xmlns: UK = “http://www.UK.edu/Instution DTD”
<USA: Title = CollegeUSA: Name = “University of Texas at Dallas”USA: State = Texas”
<UK: Title = UniversityUK: Name = “Cambridge University”UK: State = Cambs
</CountryX: Acedmic-Instiution>
XML Namespaces
<Country: Academic-Institution<Access = Government-official, Read </Access>
Xmlns: CountryX = http://www.CountryX.edu/Instution DTD”
Xmlns: USA = “http://www.USA.edu/Instution DTD”Xmlns: UK = “http://www.UK.edu/Instution DTD”
<USA: Title = CollegeUSA: Name = “University of Texas at Dallas”USA: State = Texas”
<UK: Title = UniversityUK: Name = “Cambridge University”UK: State = Cambs
</CountryX: Academic-Institution>
Federations/Distribution
Site 1 document:<Professor-name>
<ID> 111 </ID><Name> John Smith </name><State> Texas </state>
</Professor-name>
Site 2 document:<Professor-salary>
<ID> 111 </ID><salary> 60K </salary>
<Professor-salary>
Credentials in XML
<Professor credID=“9” subID = “16: CIssuer = “2”><name> Alice Brown </name><university> University of X <university/><department> CS </department><research-group> Security </research-group>
</Professor>
<Secretary credID=“12” subID = “4: CIssuer = “2”><name> John James </name><university> University of X <university/><department> CS </department><level> Senior </level>
</Secretary>
Note: This is SAML-Like, but it is not SAML. SAML is now a standard way to represent credentials
Policies in XML (we are using Xpath to represent policies, Xpath is outside of XML)
<? Xml VERSION = “1.0” ENCODING = “utf-8”?> <Policy–base>
<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/>
<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘EE’] /Short-descr/Node() and //Patent [@Dept = ‘EE’]/authors” priv = “VIEW”/>
<policy-spec cred-expr = - - - -
<policy-spec cred-expr = - - --
</Policy-base>
Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department
Note: XACML is now a standrad way to represent access control policies
Access Control Strategy Subjects request access to XML documents under two modes: Browsing and
authoring
- With browsing access subject can read/navigate documents
- Authoring access is needed to modify, delete, append documents Access control module checks the policy based and applies policy specs Views of the document are created based on credentials and policy specs In case of conflict, least access privilege rule is enforced Works for Push/Pull modes
System Architecture for Access Control
UserPull/Query Push/result
XML Documents
X-Access X-AdminAdmin Tools
Policybase
Credentialbase
Third-Party Architecture
Credential base
policy baseXML Source
User/Subject
Owner
Publisher
Query
Reply documen
t
SE-XML
credentials
The Owner is the producer of information It specifies access control policies
The Publisher is responsible for managing (a portion of) the Owner information and answering subject queries
Goal: Untrusted Publisher with respect to Authenticity and Completeness checking
XML Databases
Data is presented as XML documents Query language: XML-QL, Xquery Query optimization Managing transactions on XML documents Metadata management: XML schemas/DTDs Access methods and index strategies XML security and integrity management
Inference/Privacy Control
Policies
Ontologies
Rules
XML DatabaseXMLDocumentsWeb Pages, Databases
Inference Engine/Rules Processor
Interface to the Semantic WebTechnologyBy UTD
Why RDF?
XML cannot be used to specify semantics Example:
- Professor is a subclass of Academic Staff
- Professor inherits all properties of Academic Staff RDF was specified so that the inadequacies of XML could be
handled RDF uses XML Syntax Additional constructs are needed for RDF
RDF
Resource Description Framework is the essence of the semantic web
Adds semantics with the use of ontologies, XML syntax RDF Concepts
- Basic Model Resources, Properties and Statements
- Container Model Bag, Sequence and Alternative
RDF Basics
Resource: Everything is a resource
- Person, Vehicle, etc. Property: properties describe relationships between
resources
- E.g., Invented Statement: (Object, Property, Value) Triple
- Berners Lee invented the Semantic Web
RDF Container Model
Bag: Unordered container, may contain multiple occurrences
- Rdf: Bag Seq: Ordered container, may contain multiple occurrences
- Rdf: Seq Alt: a set of alternatives
- Rdf: Alt
RDF Specification
<rdf: RDF xmlns: rdf = “http://w3c.org/1999/02-22-rdf-syntax-ns#” xmlns: xsd = “http:// - - - xmlns: uni = “http:// - - - -
<rdf: Description: rdf: about = “949352” <uni: name = Berners Lee</uni:name> <uni: title> Professor < uni:title> </rdf: Description>
<rdf: Description rdf: about: “ZZZ”< uni: bookname> semantic web <uni:bookname>< uni: authoredby: Berners Lee <uni:authoredby>
</rdf: Description>
</rdf: RDF>
RDF Specification
RDF specifications have been given for Attributes, Types Nesting, Containers, etc.
How can security policies be included in the specification Example: consider the statement “Berners Les is the Author
of the book Semantic Web” Do we allow access to the connection between author and
book? Do we allow access to the connection but not to the author name and book name?
RDF Policy Specification
<rdf: RDF xmlns: rdf = “http://w3c.org/1999/02-22-rdf-syntax-ns#” xmlns: xsd = “http:// - - - xmlns: uni = “http:// - - - -
<rdf: Description: rdf: about = “949352” <uni: name = Berners Lee</uni:name> <uni: title> Professor < uni:title>Level = L1 </rdf: Description>
<rdf: Description rdf: about: “ZZZ”< uni: bookname> semantic web <uni:bookname>< uni: authoredby: Berners Lee <uni:authoredby>
Level = L2</rdf: Description>
</rdf: RDF>
RDF Schema
Need RDF Schema to specify statements such as professor is a subclass of academic staff
<rdfs: Class rdf: ID = “professor”
<rdfs: comment>
The class of Professors
All professors are Academic Staff Members.
<rdfs: comment>
<rdfs: subClassof rdf: resource = “academicStaffMember”/>
<rdfs: Class>
RDF Schema: Security Policies
How can security policies be specified?
<rdfs: Class rdf: ID = “professor”
<rdfs: comment>
The class of Professors
All professors are Academic Staff Members.
<rdfs: comment>
<rdfs: subClassof rdf: resource = “academicStaffMember”/>
Level = L
<rdfs: Class>
RDF Axiomatic Semantics
First order logic to specify formulas and inferencing
- Built in functions (First) and predicates (Type)
- Modus Ponens
- From A and If A then B, deduce B
Example: All containers are Resources
- Type(?C, Container) Type(?c, Resource)
- If we have Type(A, Container) then we can infer (Type A, Resource)
RDF Inferencing
While first order logic provides a proof system, it will be computationally infeasible
As a result horn clause logic was developed for logic programming; this is still computationally expensive
RDF uses If then Rules
IF E contains the triples (?u, rdfs: subClassof, ?v)
and (?v, rdfs: subClassof ?w)
THEN
E also contains the triple (?u, rdfs: subClassOf, ?w)
That is, if u is a subclass of v, and v is a subclass of w, then u is a subclass of w
RDF Query
One can query RDF using XML, but this will be very difficult as RDF is much richer than XML
Is there an analogy between say XQuery and a query language for RDF?
RQL – an SQL-like language has been developed for RDF Select from “RDF document” where some “condition” SPARQL is now a standard query language for RDF SPARQL is a combination of SWRL (Semantic Web Rules
Language) and OWL
Policies in RDF
How can policies be specified? Should policies be specified as shown in the examples,
extensions to RDF syntax? Should policies be specified as RDF documents? Is there an analogy to XPath expressions for RDF policies?
- <policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/>
Inference/Privacy Control
Policies
Ontologies
Rules
RDF Data ManagerJena
RDFDocumentsWeb Pages, Databases
Inference Engine/RDF ReasonerPellet
SPARQL: Interface to the Semantic WebTechnologyBy UTD
Ontology
Common definitions for any entity, person or thing Several ontologies have been defined and available for use Defining common ontology for an entity is a challenge Mappings have to be developed for multiple ontologies Specific languages have been developed for ontologies
Why RDF is not sufficient?
RDF was developed as XML is not sufficient to specify semantics
- E.g., class/subclass relationship RDF has issues also
- Cannot express several other properties such as Union, Interaction, relationships, etc
Need a richer language Ontology languages were developed by the semantic web
community for this purpose Essentially RDF is not sufficient to specify ontologies
Security and Ontology
Ontologies used to specify security policies
- Example: OWL to specify security policies
- Choice between XML, RDF, OWL, Rules ML, etc. Security for Ontologies
- Access control on Ontologies Give access to certain parts of the Ontology
OWL: Background
It’s a language for ontologies and relies on RDF DARPA (Defense Advanced Research Projects Agency) developed
early language DAML (DARPA Agent Markup Language) Europeans developed OIL (Ontology Interface Language) DAML+OIL combines both and was the starting point for OWL OWL was developed by W3C
OWL Features
Subclass relationship Class membership Equivalence of classes Classification Consistency (e.g., x is an instance of A, A is a subclass of B, x is not
an instance of B) Three types of OWL: OWL-Full, OWL-DL, OWL-Lite Automated tools for managing ontologies
- Ontology engineering
OWL Specification (e.g., Classes)
< owl: Class rdf: about = “#associateProfessor”>
<owl: disjointWith rdf: resource “#professor”/> <owl: disjointWith rdf: resource = #assistantProfessor”/>
</owl:Class>
<owl: Class rdf: ID = “faculty”>
<owl: equivalentClass rdf: resource = “academicStaffMember”/>
</owl: Class>
Faculty and Academic Staff Member are the same
Associate Professor is not a professor
Associate professor is not an Assistant professor
OWL Specification (e.g., Property)
Courses are taught by Academic staff members
< owl: ObjectProperty rdf: about = “#isTaughtby”>
<rdfs domain rdf: resource = “#course”/>
<rdfs: range rdf: resource = “#academicStaffMember”/>
<rdfs: subPropertyOf rdf: resource = #involves”/>
</owl: ObjectProperty>
OWL Specification (e.g., Property Restriction)
All first year courses are taught only by professors
< owl: Class rdf: about = “#”firstyearCourse”>
<rdfs: subClassOf>
<owl: Restriction>
<owl: onProperty rdf: resource = “#isTaughtBy”>
<owl: allValuesFrom rdf: resource = #Professor”/>
</rdfs: subClassOf>
</owl: Class>
Policies in OWL
How can policies be specified? Should policies be specified as shown in the examples,
extensions to OWL syntax? Should policies be specified as OWL documents? Is there an analogy to XPath expressions for OWL policies?
- <policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/>
Policies in OWL: Example
< owl: Class rdf: about = “#associateProfessor”>
<owl: disjointWith rdf: resource “#professor”/> <owl: disjointWith rdf: resource = #assistantProfessor”/>
Level = L1
</owl:Class>
<owl: Class rdf: ID = “faculty”>
<owl: equivalentClass rdf: resource = “academicStaffMember”/>
Level = L2
</owl: Class>
Logic and Inference
First order predicate logic High level language to express knowledge Well understood semantics Logical consequence - inference Proof systems exist Sound and complete OWL is based on a subset of logic – descriptive logic
Why Rules?
RDF is built on XML and OWL is built on RDF We can express subclass relationships in RDF; additional
relationships can be expressed in OWL However reasoning power is still limited in OWL Therefore the need for rules and subsequently a markup language
for rules so that machines can understand
Example Rules
Studies(X,Y), Lives(X,Z), Loc(Y,U), Loc(Z,U) HomeStudent(X)
i.e. if John Studies at UTDallas and John is lives on Campbell Road and the location of Campbell Road and UTDallas are Richardson then John is a Home student
Note that
Person (X) Man(X) or Woman(X) is not a rule in predicate logic
That is if X is a person then X is either a man of a woman. This can be expressed in OWL
However we can have a rule of the form
Person(X) and Not Man(X) Woman(X)
Monotonic Rules
Mother(X,Y) Mother(X,Y) Parent(X,Y)
If Mary is the mother of John, then Mary is the parent of John
Syntax: Facts and Rules
Rule is of the form:
B1, B2, ---- Bn A
That is, if B1, B2, ---Bn hold then A holds
Logic Programming
Deductive logic programming is in general based on deduction
- i.e., Deduce data from existing data and rules
- e.g., Father of a father is a grandfather, John is the father of Peter and Peter is the father of James and therefore John is the grandfather of James
Inductive logic programming deduces rules from the data
- e.g., John is the father of Peter, Peter is the father of James, John is the grandfather of James, James is the father of Robert, Peter is the grandfather of Robert
- From the above data, deduce that the father of a father is a grandfather
Popular in Europe and Japan
Nonmonotonic Rules
If we have X and NOT X, we do not treat them as inconsistent as in the case of monotonic reasoning.
For example, consider the example of an apartment that is acceptable to John. That is, in general John is prepared to rent an apartment unless the apartment ahs less than two bedrooms, is does not allow pets etc. This can be expressed as follows:
Acceptable(X) Bedroom(X,Y), Y<2 NOT Acceptable(X) NOT Pets(X) NOT Acceptable(X)
Note that there could be a contradiction. But with nonmotonic reasoning this is allowed.
Rule Markup
The various components of logic are expressed in the Rule Markup Language – RuleML
Both monotonic and nonmonotnic rules can be represented
Example representation of Fact P(a) - a is a parent
<fact>
<atom>
<predicate>p</predicate>
<term>
<const>a</const>
<term>
<atom>
</fact>
Policies in RuleML
<fact><atom>
<predicate>p</predicate> <term> <const>a</const> <term> <atom>Level = L </fact>
Semantic Access Control (SAC)
Traditional Access Control
Traditional Access Control Semantic WebSemantic Web
Semantic Access ControlSemantic Access Control
Motivation
Shortcomings of Traditional Access Control
- Proprietary systems
- Lack of modularity
- Changes in access control schemas break the system
- Changes in data schemas break the system
- Path to resources (e.g., XPATH) is clumsy
//school/department/professor/personal/ssn – LONG!
- Non-optimal for distributed/federation environment
Modularity Problem
People this policy applies to
Resources this policy applies to
Actions allowed for this policyTarget
Box
SAC Ontology
Written in OWL (Web Ontology Language) User-centric Modular Easily extensible Available at :
http://utd61105.campus.ad.utdallas.edu/geo/voc/newaccessonto
SAC Components
Subjects: Software Agents or Human clients Resources: Assets exposed through WS Actions: Read, Write, Execute Conditions: Additional constraints (e.g., geospatial
parameters) on policy enforcement
Resources
Subjects
ActionsCondition
Policy Set
Application: Geo-WS Security
Data providers (e.g., geospatial clearinghouses, research centers) need access control on serviceable resources.
Access policies have geospatial dimension
- Bob has access on Building A
- Bob does NOT have access on Building B
- Building A and B have overlapping area
Current access control mechanisms are static and non-modular.
Geo-WS Security: Architecture
ClientClientDAGIS
DAGIS
Geospatial Semantic WS Provider
Enforcement Module
Decision Module
Authorization Module
Semantic-enabled Policy DB
Web Service Client Side Web Service Provider Side
Geo-WS Security: Semantics
Policy rules are based on description logic (DL).
DL allows machine-processed deductions on policy base.
Example 1:
- DL Rule: ‘Stores’ Inverse ‘Is Stored In’
- Fact: Airplane_Hanger(X) ‘stores’ Airplane(Y)
Example 2:
- DL Rule: ‘Is Located In’ is Transitive.
- Fact: Polygon(S) ‘Is Located In’ Polygon(V)
Polygon(V) ‘Is Located In’ Polygon(T)
Secure Inferencing
Geospatial DataStore
Semantic-enabled Policy DB
Inferencing Module
Obvious facts
Deduced facts
Geo-WS Security: Example
Resource :=
Washington, Oregon, California, West Coast
Rule:=
West Coast = WA Union OR Union CA
Policy:=
- Subject:= Bob
- Resources:= WA, OR, CA
- Action:=Read
Query: Retrieve Interstate Highway topology of West
Coast
SAC in Action
Environment: University Campus Campus Ontology
http://utd61105.campus.ad.utdallas.edu/geo/voc/campusonto
Main Resources
- Computer Science Building
- Pharmacy Building
- Electric Generator in each Building
SAC in Action
User Access:
- Bob has ‘execute’ access to all Building Resources
- Bob doesn’t have any access to CS Building
- Bob has ‘modify’ access to Building resources within a certain geographic extent
Policy File located at
http://utd61105.campus.ad.utdallas.edu/geo/voc/policyfile1
SAC Improvements
Subjects, Resources, Actions and Conditions are defined independently
Reduced policy look-up cost -- only policies related to the requester is processed
No long path name!
Distributed Access Control
Travel Site Reimbursement Site Bank Site
Travel Data& Ontology
ReimbursementData
Bank Site& Ontology
Client Query Interface
Middleware
Common Threads and Challenges
Common Threads
- Building Ontologies for Semantics
- XML for Syntax Challenges
- Scalability, Resolvability
- Security policy specification, Securing the documents and ontologies
- Developing applications for secure semantic web technologies
- Automated tools for ontology management Creating, maintaining, evolving and querying ontologies