Nancy IdeVassar College
USA
Resource Definition Framework
A Tutorial
EUROLAN 2003 • July 28 - August 8 • Bucharest - Romania
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
The Semantic Web: Where RDF fits in
RDF overviewConcepts
Data Model
RDF Syntax
RDF Schema
RDF, RDFS and language technology
Outline
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
What is the Semantic Web?“a conceptual information space in which resources identified by URIs can be processed by machines”
Relies on three key elements:identification of resources
defining the semantics of resource descriptions and relationships among resources
inferring new knowledge from available information
All of this must be done using common, machine-processable notations
Overview
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Supporting TechnologiesThe Layer-cake model
XML
RDF RDF Schema
Ontologies (OWL)
Rules
Logic Framework
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Provides a common syntax for marking up documents
Data model: ordered, labeled tree
The Base: XML
<bookInfo> <title>The Royal Navy</title> <author> <persName type=“pen name”> <title>Sir</title> <foreName>Edward</foreName> <surName>Bulwer-Lytton</surName> <rolename>Barron Lytton of <placeName>Kenworth</placeName> </roleName> </persName> </author></bookInfo>
bookinfo
title
surNametitle
author
persName
foreName
placeName
roleName
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Why Do We Need RDF?
<bookInfo> <title>The Royal Navy</title> <author> <persName type=“pen name”> <title>Sir</title> <foreName>Edward</foreName> <surName>Bulwer-Lytton</surName> <rolename>Barron Lytton of <placeName>Kenworth</placeName> </roleName> </persName> </author></bookInfo>
XML provides only impoverished semantics
<X356T0> <Y71109>The Royal Navy</Y71109> <KH561F> <L098JN> type=“pen name”> <Y71109>Sir</Y71109 > <XXS553>Edward</XXS553 > <NJK098>Bulwer-Lytton</NJK098> <R4W23T>Barron Lytton of <PPY6G1>Kenworth</PPY6G1> </R4W23T> </L098JN> </KH561F></X356T0>
What the human sees What the computer sees
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
No agreement onstructure
what does nesting mean? Part-of? Something else?
is bookInfo an object? class? attribute? relation? something else?
vocabulary
do both title elements mean the same thing?
is author the same ascreator?
XML “semantics”
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Provides a way to give meaning to information that is machine-processable
W3C Recommendationhttp://www.w3c.org/RDF
A data model for describing data about data (metadata)
RDFResource Definition Framework
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Three object types Resources
Things being described by RDF expressions. Resources are always named by URIs
e.g., HTML Document, specific XML element within the document source, a collection of pages, a book
PropertiesSpecific aspect, characteristic, attribute or relation used to describe a resource
e.g., Creator, Title, Name
Statements
Resource (Subject) + Property (Predicate) + Property Value (Object)
RDF
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF Statements Three parts: subject, predicate, object
describe properties of resources
ResourceAnything that can be described by a URI
a document, part of a document, image, on the Web
http://www.cs.vassar.edu/~ide
a real world object
e.g. a book: isbn://9402-5546-1234
The Data Model
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Uniform Resource IdentifierThe generic set of all names/addresses consisting of short strings that refer to resources
URLs (Uniform Resource Locators) are a particular type of URI, used on the WWW
URIs look like URLs, sometimes with fragment identifiers to point at specific parts of a document
URIs
http://somedomain.com/some/path/to/file#fragment
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Basic element is the triplea resource (the subject) is linked to another resource (the object) via an arc labeled by a relation (the predicate)
<subject> has a property <predicate> valued by <object>
Example
RDF
NancyIde
EncodingSyntactic
Annotation
author-of
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Statements The English word “car” translates to the French word “voiture”The word “car” is a nounNancy Ide the author of “Encoding Syntactic Annotation”
Examples
translates-toCAR voiture
noun
NancyIde
EncodingSyntactic
Annotation
is-a
author-of
SUBJECT PREDICATE OBJECTCAR translates-to voiture
CAR is-a noun
Nancy Ide author-ofEncoding Syntactic Annotation
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
The subject of one statement can be the object of another statement
RESULT: a labeled directed graph
RDF Triples
NancyIde
EncodingSyntactic
Annotation
author-ofemployee
Vassar College
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
One syntax for expressing RDF statements is XML
Tags and attributes have a specific meaningDescription element describes a resource
every attribute or nested element inside a Description is a property of that resource
RDF Syntax
<Description about=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>
Does this solve the structure and vocabulary problems?
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Different ways to express the same model
RDF/XML Syntax is Just a Syntax
<Description about=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>
<Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of> </employee></Description>
<Description about=”http://www.cs.vassar.edu/~ide” author-of=”Encoding Syntactic Annotation”/></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Use namespaces to indicate where the defining RDF schema exists
Namespaces
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vassar=”http://www.vassar.edu/schema.rdf” xmlns:biblio=”http://www.library\ies.org/schema.rdf”>
<Description rdf:about=”http://www.cs.vassar.edu/~ide”> <biblio:author-of>Encoding Syntactic Annotation</biblio:author-of></Description><Description rdf:about=”http://www.vassar.edu”> <vassar:employee rdf:resource=”http://www.cs.vassar.edu/~ide”/></Description>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Make explicit statements about web resources
The computer knows that these are statements, knows how the statements relate, can compare values
But...we still lack a way to define a vocabulary
Should we use author or creator?
Is Nancy Ide an author?
Are there other authors?
What properties can authors have?
What is RDF Used For?
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF is a data model that allows you to assert relation(s) between two objects
RDFS (RDF schemas) are a means to define classes and sub-classes of objects and the relations that may hold between these objects
RDF and RDFS
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF provides a data model for metadata annotation and a way to express it in XML, but it cannot define the vocabulary for a domain
RDF Schema allow you to define vocabulary terms and the relations between these terms
Adds semantics to RDF predicates and resources
define how a term should be interpreted by specifying its properties and the kinds of objects that can be the values of these properties
RDF Schema
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF Schema core primitives
Class, Property
type, subClassOf, domain, range
Vocabulary definition with these primitives:
<Person, type, Class>
<Author, subClassOf, Person>
<Employee, domain, Person>
Some RDF Schema Terminology
These are just RDF statements, but in RDF Schema they have special meaning
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
The semantics of RDF Schema are expressed in natural language:
2.3.2 rdfs:subClassOf
The semantics of RDF Schema
“This property specifies a subset/superset relation between classes. The rdfs:subClassOf property is transitive. If class A is a subclass of some broader class B, and B is a subclass of C, then A is also implicitly a subclass of C. Consequently, resources that are instances of class A will also be instances of class C, since A is a subset of both B and C. Only instances of rdfs:Class can have the rdfs:subClassOf property and the property value is always of rdf:type rdfs:Class. A class may be a subclass of more than one class.”
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Set-theoretical semantics for RDF and RDFS specifies entailment rules, for example:
[rdfs7b] (reflexivity)(xxx, rdf:type, rdfs:Class) => (xxx, rdfs:subClassOf, xxx)
[rdfs8] (transitivity)(xxx, rdfs:subClassOf, yyy) & (yyy, rdfs:subClassOf, zzz) => (xxx, rdfs:subClassOf, zzz)
RDF Model Theory
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Example RDF Schema
Part-of-Speech
Noun Verb
Motion VerbCommon Noun
Subject-of
sub-class of sub-class of
sub-class of sub-class ofdomain range
Ontology Level
Data LevelSubject-of
Dogs run
type type
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Part-of-Speech
Noun Verb
Motion VerbCommon Noun
Subject-of
sub-class of sub-class of
sub-class of sub-class ofdomain range
Ontology Level
Language Level
Resourcesub-class of sub-class of
Property Class
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Classes and properties are modeled separately!
Different from typical Object-Oriented modeling where properties (attributes) are part of a class
Because of this, domain/range statements are very restrictive
Observations
Remember: RDF Schema is just RDF, but with some added meaning to particular terms
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Domain RestrictionsPart-of-Speech
Noun Verb
Motion VerbCommon Noun
Genderdomain
chatbouge
MM
“M” is a literal value
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Problem solved...
Noun Verb
Genderdomain
Part-of-Speech
Moving the domain restriction up the hierarchy solves the problem
But risk over-generalization
properties get “loose” restrictions
classes may be allowed properties they should not have
e.g. now any part of speech has the GENDER property
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF Schema Syntax
<rdfs:Property rdf:about="http://www.linguistics.org/schema.rdf#number"><rdfs:domain rdf:resource="http://www.linguistics.orgschema.rdf#PartOfSpeech"/><rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#Noun"><rdfs:label>Noun</rdfs:label> <rdfs:comment>Class for nouns</rdfs:comment> <rdfs:subClassOf rdfs:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/></rdfs:Class>
<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#PartOfSpeech"><rdfs:label>POS</rdfs:label> <rdfs:comment>Class for the general category part of speech</rdfs:comment> </rdfs:Class>
Class Definitions
Property Definition
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Putting It All Together
<rdfs:Property rdf:about="http://www.linguistics.org/schema.rdf#number"><rdfs:domain rdf:resource="http://www.linguistics.orgschema.rdf#PartOfSpeech"/><rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:RDF>
<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#Noun"><rdfs:label>Noun</rdfs:label> <rdfs:comment>Class for nouns</rdfs:comment> <rdfs:subClassOf rdfs:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/></rdfs:Class>
<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#PartOfSpeech"><rdfs:label>POS</rdfs:label> <rdfs:comment>Class for the general category part of speech</rdfs:comment> </rdfs:Class>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
The schema file: http://www.linguistics.org/schema.rdf
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Using the Schema
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pos="http://www.linguistics.org/schema.rdf#">
<pos:Noun rdf:ID="dogs"> <pos:number rdf:value="Plural"/></pos:Noun><pos:Verb rdf:ID="run"> <pos:number rdf:value="Plural"/></pos:Verb>
</rdf:RDF>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Defining a Default Namespace
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">
<Noun rdf:ID="dogs"> <number rdf:value="Plural"/></Noun><Verb rdf:ID="run"> <number rdf:value="Plural"/></Verb>
</rdf:RDF>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Referring To Another Resource<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">
<Noun rdf:about="Mydoc#W1"> <number rdf:value="Plural"/><Noun><Verb rdf:about="Mydoc#W2"> <number rdf:value="Plural"/></Verb>
</rdf:RDF>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
One possible use of RDF is to pre-define “linguistic objects” that can be used by other resources such as lexicons, taggers, etc.
An RDF schema defines a class and its properties, but does not instantiate objects of that class
in previous examples, “dogs” and “run” were instantiated as objects of class Noun
Creating Pre-defined Linguistic Objects
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
A “Data Category” Definition<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">
<Noun rdf:ID=”NMP”> <gender rdf:value=”masculine”/> <number rdf:value=”plural”/></Noun>
<Verb rdf:ID=”V3pl”> <number rdf:value=”plural”/> <person rdf:value=”3rd”/></Verb>
</rdf:RDF>
File: http://www.linguistics.org/categories.rdf”
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
Using the Definition<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ling="http://www.linguistics.org/schema.rdf#">
<ling:word rdf:value=”dog”> <ling:POS rdf:resource=”http://www.linguistics.org/categories.rdf#NMS”/> <ling:word rdf:about=”http://www.mySite.edu/myDoc#W1”> <ling:POS rdf:resource=”http://www.linguistics.org/categories.rdf#NMS”/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#word"><rdfs:label>Word</rdfs:label> <rdfs:comment>Class for a word</rdfs:comment> </rdfs:Class>
<rdf:Property rdf:ID="POS"><rdfs:domain rdfs:resource="http://www.linguistics.orgschema.rdf#word"/><rdfs:range rdf:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/>
</rdf:RDF>
Additions to the linguistics schema.rdf
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
RDF and RDFS give us the capability to provide some semantics for resources and the relations between them
But there is a lot missingboolean operators, cardinality constraints, disjunction, etc.
These are in the next level: OWL
Beyond RDF and RDFS
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
The previous examples suggest how the Semantic Web can benefit language technology
ResourcesPre-defined linguistic objects can be used in lexicons, term banks, annotations, etc.
Goes toward a commonly agreed-upon set of categories
Language Processing applicationsCan exploit linguistic knowledge “attached” to data to enhance capability
The Semantic Web and Language Technology
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania
W3C RDF Model and Syntax Specification
http://www.w3.org/TR/REC-rdf-syntax/
W3C RDF Schema Specification 1.0 http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
W3C RDF Validation Service
http://www.w3.org/RDF/Validator/
W3C RDF http://www.w3.org/RDF/
List of RDF resources http://www.ilrt.bris.ac.uk/discovery/rdf/resources/
SiRPAC - Simple RDF Parser & Compiler (Java) http://www.w3.org/RDF/Implementations/SiRPAC/
Libwww - RDF Parser (C) http://www.w3.org/Library/
Resources and Tools