ontology, semantic web and dbpedia
TRANSCRIPT
10/8/2009
1
Ontology, Semantic Web and
Global Database
10/9/2009 1Creative Commons - BY-NC
Contents
• Ontology? Why?
P tѐ ѐ• Protѐgѐ
• Semantic Web
• Linked Open Data
10/9/2009 2Creative Commons - BY-NC
10/8/2009
2
Syntactic Web
10/9/2009 Creative Commons - BY-NC 3
ProblemsA typical web page is designed with markup language ,HTML, which is designed for rendering presentation and Hyperlink to related information. Semantic content is accessiblecontent is accessible to humans but not to computers.
10/9/2009 Creative Commons - BY-NC 4
10/8/2009
3
Linguistic
Concept
ReferentForm
Concept
Relates toActivates
10/9/2009 Creative Commons - BY-NC 5
TankStands for
?
Problems
• Keyword‐based Search
S d H• Synonyms and Homonyms
• No Parameter Search
• No Cross Silos Data Extraction or Comparison
• No Unified View and/or Interpretation of Data
• Limited Ability to Re‐use of Datay
• Difficult to Share Data with Business Partners
10/9/2009 Creative Commons - BY-NC 6
10/8/2009
4
Need to Add “Semantics”
• Using Ontology to specify the meaning of annotationannotation.– Ontology provides a set of vocabulary terms
– New terms can be defined with existing ones
– Meaning of each term can be formally specified
– The relationship between terms can be defined
10/9/2009 Creative Commons - BY-NC 7
Web
• Web 1.0 – links documents to documents
W b 2 0 id t t f• Web 2.0 – provides contents from users
• Web 3.0 – links data to data
10/9/2009 Creative Commons - BY-NC 8
10/8/2009
5
What is Ontology? http://en.wikipedia.org/wiki/Ontology_%28information_science%29
• In computer science and information science, an ontology is a formal representation of a set ofontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain.
• An ontology is a formal, explicit specification of a conceptualization.
10/9/2009 Creative Commons - BY-NC 9
XML (Extensible Markup Language)It is a textual data format, with strong support via Unicode for the languages
Well‐formed and error‐handling• It contains only properly‐encoded legal
Unicode characters. None of the special syntax characters such as "<" and "&" appear except when performing their markup‐delineation roles.
• The begin, end, and empty‐elementUnicode for the languages of the world. Although XML’s design focuses on documents, it is widely used for the representation of arbitrary data structures.
The begin, end, and empty element tags which delimit the elements are correctly nested, with none missing and none overlapping.
• The element tags are case‐sensitive; the beginning and end tags must match exactly.
• There is a single "root" element which contains all the other elements.
10/9/2009 Creative Commons - BY-NC 10
10/8/2009
6
XSD (XML Schema)
XSD datatypes ‐1/2• xsd:string, • xsd:boolean, • xsd:decimal, • xsd:float, • xsd:double, • xsd:dateTime,
d i
XSD can be used to express a set of rules to which an XML document must
XSD datatypes ‐2/2• xsd:language, • xsd:NMTOKEN, • xsd:Name, • xsd:NCName,• xsd:integer,• xsd:nonPositiveInteger,• xsd:time,
• xsd:date, • xsd:gYearMonth, • xsd:gYear, • xsd:gMonthDay, • xsd:gDay, • xsd:gMonth, • xsd:hexBinary, • xsd:base64Binary
XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that
xsd:nonPositiveInteger,• xsd:negativeInteger, • xsd:long, • xsd:int, • xsd:short,• xsd:byte,• xsd:nonNegativeInteger,• xsd:unsignedLong,
d i dIxsd:base64Binary, • xsd:anyURI, • xsd:normalizedString, • xsd:token,
determination of a document's validity would produce a collection of information adhering to specific data types.
10/9/2009 Creative Commons - BY-NC 11
• xsd:unsignedInt,• xsd:unsignedShort,• xsd:unsignedByte,• xsd:positiveIntegers
RDF (Resource Descriptive Framework)
RDF vocabulary• rdf:type
• rdf:Property
• rdf:XMLLiteral
• rdf:nil
• rdf:List
RDF describes statements about resources, in particular Web resources
• rdf:Statement
• rdf:subject
• rdf:predicate
• rdf:object
• rdf:first
• rdf:rest
• rdf:Seq
particular, Web resources, in the form of subject‐predicate‐object expressions. These expressions are known as triples in RDF terminology.
rdf:Seq
• rdf:Bag
• rdf:Alt
• rdf:_1
• rdf:_2 ...
• rdf:value
10/9/2009 Creative Commons - BY-NC 12
10/8/2009
7
Triples and GraphThe base element of the RDF model is the triple:
• a resource (the subject)• a resource (the subject)• inks (the predicate) • another resource (the object)
A resource <subject> has a property <predicate> valued by <object>.
10/9/2009 Creative Commons - BY-NC 13
<subject> <predicate> <object>
Pro and Cons of RDF
• ProsU i l d t d l ( t XML bj t d l ti l– Universal data model (map to XML, object and relational model)
– Additive, easy to merge multiple RDFs
– Predicate logic (like prolog)
– Use URI to identify a resource
• ConsCons– Lacks of concepts of enumeration
– Lacks data types
– No Object‐Oriented Features
10/9/2009 Creative Commons - BY-NC 14
10/8/2009
8
Resource (RDFS)
Classes• rdfs:Resource
• rdfs:Literal
• rdfs:Class
• rdfs:Datatype
df C i
RDF Schema (RDFS) is an extensible knowledge representation language
Properties• rdfs:subClassOf• rdfs:subPropertyOf• rdfs:domain• rdfs:range• rdfs:label
df t• rdfs:Container
• rdfs:ContainerMembershipProperty
• rdf:List
• rdf:Statement
• rdf:Bag
• rdf:Seq
representation language, providing basic elements for the description of ontologies, otherwise called Resource Description Framework (RDF) vocabularies, intended to structure RDF
• rdfs:comment• rdfs:member• rdfs:seeAlso• rdfs:isDefinedBy• rdf:first• rdf:rest• rdf:type• rdf:valuerdf:Seq
• rdf:Alt
• rdf:XMLLiteral
• rdf:Property
resources.
10/9/2009 Creative Commons - BY-NC 15
• rdf:subject• rdf:predicate• rdf:object
Web Ontology Language
10/9/2009 Creative Commons - BY-NC 16
10/8/2009
9
Web Ontology Language (OWL)
• Extends RDF/RDFS to support complex knowledge representationrepresentation.
• An OWL ontology may include descriptions of classes, properties and their instances.
• Open‐World assumption – what is not known is not “untrue”, it is just “unknown”.
10/9/2009 Creative Commons - BY-NC 17
OWL‐1
• OWL‐LiteS t i l l ifi ti ll l di liti– Support simple classification, allows only cardinalities (member count) of 1 and 0 and only minimal constraints.
• OWL‐DL (Descriptive Language)– Supports more complex ontologies, but with guarantees, such as processing finishing in finite time, restricting elements to be one type.
• OWL‐Full– Full support for maximum freedom of RDF, with no computational guarantees.
10/9/2009 Creative Commons - BY-NC 18
10/8/2009
10
OWL Classes and Properties partial list, see http://www.w3.org/TR/owl‐guide/ for full list
• Class– owl:class
• Property Restrictions– owl:allValuesFrom
– rdfs:subClassOf
• Property– owl:ObjectProperty
– owl:DataProperty
– rdfs:subPropertyOf
– rdfs:domain
– rdfs:range
• Property Characteristic
– owl:someValuesFrom
– owl:cardinality
– owl:someValue
• Equivalence– owl:EquivalenceClass
– owl:EquivalenceProperty
– owl:sameAs
• Complex Classesp y– owl:TransitiveProperty
– owl:FunctionalProperty
– owl:InverseProperty
– owl:InverseFunctionalProperty
p– owl:IntersectionOf
– owl:UnionOf
– owl:CompoundOf
10/9/2009 Creative Commons - BY-NC 19
Semantic Web Layer CakeFrom: http://www.semanticfocus.com/blog/entry/title/introduction‐to‐the‐semantic‐web‐vision‐and‐technologies‐part‐1‐overview/
10/9/2009 Creative Commons - BY-NC 20
10/8/2009
11
Tools
• RDF/OWL EditorsP tѐ ѐ T b id– Protѐgѐ, Topbraid, …
• RDF Store– SwiftOWLIM, AllegroGraph, OpenLink Virtuoso, …
• Query– SPARQL
• Reasoners– Pellet, FaCT++, …
10/9/2009 Creative Commons - BY-NC 21
10/9/2009 Creative Commons - BY-NC 22
10/8/2009
12
Protѐgѐ Overview
• Stanford Center for Biomedical Informatics Research, – Stanford UniversityStanford University
– University of Manchester
• OWL Editor
• Plugins: Natural Language, Visualization, Rules Engine, Database, …
• Very well documented,
• Long history with many academic supports
10/9/2009 Creative Commons - BY-NC 23
Protѐgѐ – Class View
10/9/2009 Creative Commons - BY-NC 24
10/8/2009
13
Protѐgѐ – Object Property View
10/9/2009 Creative Commons - BY-NC 25
Protѐgѐ – Value Property View
10/9/2009 Creative Commons - BY-NC 26
10/8/2009
14
Protѐgѐ ‐ Visualization
10/9/2009 Creative Commons - BY-NC 27
Ontology Development
• Define purpose and scopes
Eli it k l d• Elicit knowledge
• Collect and organize concepts
• Classify and add axioms
• Reasoning
10/9/2009 Creative Commons - BY-NC 28
10/8/2009
15
OWL vs. UML class modeling
• OWL properties vs. UML associations & attributesOWL ti h di ti– OWL properties have a direction
– OWL properties are binary relations
– OWL properties are “first‐class” citizens (global scope)
• OWL classes vs. UML classes– OWL classes have no operations
OWL classes can have “sufficient” conditions– OWL classes can have sufficient conditions• Primitive vs. defined classes
2910/9/2009 Creative Commons - BY-NC
Ontologies and Data Models
• Ontologies live in an open, distributed world; data models in a closed worldmodels in a closed world
• Writing a model in OWL does not make it an ontology– The ontology should be shared
3010/9/2009 Creative Commons - BY-NC
10/8/2009
16
Semantic Web
10/9/2009 Creative Commons - BY-NC 31
Web Technologiesfrom http://www.abricocotier.fr/5694‐les‐trois‐grandes‐etapes‐de‐levolution‐du‐web
10/9/2009 Creative Commons - BY-NC 32
10/8/2009
17
Benefit Semantic Web Applications
• Less coding, more meaningful data structure
L b i l• Less business rules
• More across boundary information
• Embedded logic
10/9/2009 Creative Commons - BY-NC 33
Global Databasefrom: Tim Berners‐Lee, Weaving the Web, 1999
• "If HTML and the Web made all the online documents look like one huge book RDF schemadocuments look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database"
10/9/2009 Creative Commons - BY-NC 34
10/8/2009
18
nterne
tnterne
tme
meto th
e In
to th
e In
10/9/2009 Creative Commons - BY-NC 35
Welcom
Welcom
One Global Machine
10/9/2009 Creative Commons - BY-NC 36
10/8/2009
19
Dimension of Global MachineFrom: http://www.kk.org/thetechnium/archives/2007/11/dimensions_of_t.php
170 quadrillion (170 * 10^15) Transistors
55 trillion (55* 10^12) Links55 trillion (55 10 12) Links
2 megahertz Emails
31 kilohertz Text Messages
162 kilohertz Instance Messages
14 kilohertz Search
246 exabyte Storage
9 exabyte (9 * 10^18) RAM
9 terabyes/second Bandwidth
800 billion kwh/year Power consumption
10/9/2009 Creative Commons - BY-NC 37
10/9/2009 38Creative Commons - BY-NC
10/8/2009
20
10/9/2009 Creative Commons - BY-NC 39
DBpedia
• Structure multiple wikipedia information to allow query directlyquery directly
• Build from scratch, 170 classes, 900 properties
• Serves as hub for other databases
10/9/2009 Creative Commons - BY-NC 40
10/8/2009
21
Multilingual
Abstracts– English: 2,613,000 g , ,– German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000
10/9/2009 Creative Commons - BY-NC 41
– Japanese: 199,000 – Portuguese: 246,000 – Swedish: 144,000 – Chinese: 101,000
Sept 2008
May 2007
April 20082 billion RDF triples
10/9/2009 Creative Commons - BY-NC 42
May 2007500 million RDF triples
10/8/2009
22
Linked Open Database March 20094.5 billion RDF triples180 data million links
Online ActivitiesMusic Online Activities
PublicationsGeographic
Cross-Domain
10/9/2009 Creative Commons - BY-NC 43
Life Sciences
Open Questions
• Architecture Impact
D i A li ti• Device Applications
• Device Management
• Data Structure and Management
• Software Evolution, new requirements
• Competitor’s offersp
• …
10/9/2009 Creative Commons - BY-NC 44
10/8/2009
23
Thank You for Your Attention
10/9/2009 Creative Commons - BY-NC 45