swoogle tutorial (part i: swoogle r & d)
Post on 12-Jan-2016
78 Views
Preview:
DESCRIPTION
TRANSCRIPT
eBiquity Lab, CSEE, UMBC @
Swoogle Tutorial (Part I: Swoogle R & D)
A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development
Presented by eBiquity Lab, CSEE, UMBC
1. Introducti
on Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture
SwoogleSwoogle
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Motivation
(Google + Web) has made us all smarter something similar is needed by people and software
agents for information on the semantic web
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
The Role of Swoogle in Semantic Web
Semantic WebServices
Semantic web data
Software Agents, Applications
SW data service
database(Web) document
RDF document
usesuses
Directory/Digest Service
Service Finder
digestsdigests
searches
Data Finder Swoogle
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Concepts Explained
wordNet:Agent
rdf:typerdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
foaf:mbox
rdfs:domain
rdf:typerdf:Property
Property
Class
SWO
http://foo.com/foaf.rdf#finin
foaf:mbox
rdf:type
finin@umbc.edu
foaf:Person
http://foo.com/foaf.rdf#finin
SWI
Individual
SWD
Term
NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows
rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/wordNet: => http://xmlns.com/wordnet/1.6/
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Glossary Document
A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).
An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.
An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic.
Term A term is a non-anonymous RDF resource which is the URI reference of
either a class or a property.
Individual An individual refers to a non-anonymous RDF resource which is the URI
reference of a class member.
In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple.
*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm
rdf:typerdfs:Class
foaf:Person
rdf:typefoaf:Person
http://.../foaf.rdf#finin
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Swoogle Architecture
metadata creation
data analysis
interface
SWD discovery
SWD MetadataWeb Service
Web Server
SWD Cache
The Web
The WebCandidate
URLs Web Crawler
SWD Reader
IR analyzer SWD analyzer
Agent Service
2. Swoogle Research
Discovery Digest Search & Navigation Rank Statistics
SwoogleSwoogle
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Discovery - research Discovering URLs of possible SWD
automatically Google-crawler Focused-crawler Semantic-Web-crawler, e.g. scutter
Revisiting URLs
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Discovery -- results Crawler performance
Google crawler is the best Focused crawler needs to be improved
Verified pure SWDs are only 1/3 of discovered URLs Some NSWDs contains embedded RDF graph.
SWD NSWD Undecided TOTAL
Focused Crawler 1,465 7% 10,580 52% 8,292 20,337
google crawler 273,023 36% 369,371 49% 110,794 753,188
swd_crawler 61,870 15% 285,506 70% 57,709 405,085
TOTAL 336,358 665,457 176,795 1,178,610
Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Digest -- research Document metadata
Annotative General metadata SWD metadata Ontology metadata
Inter-document relations Document-term relations
Term metadata Term Definition Inter-term Relation
Class-property bond (C-P bond): rdfs:domain Property-Class bond (P-C bond): rdfs:range
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Document Metadata Web document metadata
When/how discovered/fetched Suffix of URL Last modified time Document size
SWD metadata Language features
OWL species RDF encoding
Statistical features # of Defined/used terms # of Declared/used namespaces Ontology Ratio
Ontology Rank
Ontology annotation Label Version Comment
Relations Links to other SWDs
Imported SWDs Referenced SWDs Extended SWDs Prior version
Links to terms Classes/properties defined Classes/properties used
Digest “Time” Ontology (document view)
Demo2(a)
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Document-Term Relation
foaf:mbox
rdf:type
finin@umbc.edu
foaf:Person
http://www.cs.umbc.edu/~finin/foaf.rdf
wordNet:Agent
rdf:typerdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
foaf:mbox
rdfs:domain
rdf:typerdf:Property
populated Class
defined Class
populated Property
defined Property
http://foo.com/foaf.rdf#finin
foaf:mbox
rdf:type
finin@umbc.edu
foaf:Person
http://foo.com/foaf.rdf
defined Individual
Digest “Time” Ontology (term view)
Demo2(b)
………….
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Term Metadata
Term Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”
C-P bond (from SWI)• foaf:name• dc:title
C-P bond (from SWO)• foaf:mbox• foaf:name
foaf:name
foaf:mbox
rdfs:domain
rdfs:domain
Onto 1
owl:Classrdf:type
“Person”rdfs:label
foaf:Agent
rdfs:subClassOf
Onto 2
foaf:name
rdf:type
“Tim Finin”
SWD3
foaf:Person
Digest Term “Person”Demo4
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Term Distribution (grouped by local name)case-insensitive case-sensitive
Name 656 1 name 560 11 source 129
Person 399 2 Person 357 12 email 125
Title 349 3 title 292 13 Book 124
Location 334 4 description 242 14 address 121
Description 288 5 location 213 15 Event 117
Date 257 6 type 196 16 Location 114
Type 242 7 date 173 17 author 111
country 236 8 value 154 18 Animal 111
Address 212 9 Organization 134 19 Country 104
organization 186 10 country 130 20 language 103
total 72502 total 76827
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Digest -- result
type Pop. Def. # termTotalTerms # populated
Totalpopulated
class 0 1 83,602 88% 0 0%
1 0 3,954 4% 1,002,961 13%
1 1 7,065 7% 94,621 6,483,485 87% 7,486,446
property 0 1 42,853 73% 0 0%
1 0 8,312 14% 2,438,455 6%
1 1 7,836 13% 59,001 36,899,842 94% 39,338,297
Ontological Term Distribution (populated, defined)
Source: Swoogle (2005-Jan-05) SELECT res_type,sign(cnt_instance_populate>0), sign(cnt_swd_def>0),count(*), sum(cnt_instance_populate) FROM `digest_term` WHERE 1 group by res_type, sign(cnt_instance_populate>0), sign(cnt_swd_def>0)
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Search & Navigation -- researchThe Semantic Web is not the Web
Search service Document search – RDF document is not free text Term search – URIref and compound local name
Navigation service The RDF graph – Typed links The web of RDF documents – Few hyperlinks The social network of agents – trust & provenance
Find “Time” Ontology
We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo1
Find Term “Person”Demo3
Not capitalized! URIref is case sensitive!
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Current Swoogle Navigation Model A URIref refers to
A term, i.e. instance of RDFS class/property
An individual, i.e. populated terms A SWD could be
SWO: term definition SWI: individuals
Observations RDF Resources are semantically
linked in RDF graph SWDs are poorly linked due to the
absence of explicit hyperlink concept
Ontologies are more interesting Approach
Build inter-document relations Rational surfing model
SWOs
SWIs
HTMLdocuments
Images
Audiofiles
Videofiles
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
URL
URIref
Semantic Web Navigation Model new!
Resource
RDF Document
populatesClasspopulatesPropertyrefersClassrefersProperty
definesClassdefinesProperty
rdfsOntologyowldlOntology
owl:importsowl:priorVersionowl:backwardCompatibleWithowl:imcompatiableWith
rdfs:seeAlsordfs:isDefinedBy
Ontology
Namespace
isDefinedByisUsedBy
usesNamespace
rdfs:subClassOf
sameNamespacesameLocalname
RDF Graph Navigation …Term Search
Document Search
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Ranking -- research Surfing models
Ranking method PageRank variation
What to rank
Scope Idea
Rational surfing model SWD Semantic Web Summarize inter-document relation as EX, TM, IM, PV
Plain Graph Model Resource RDF graph RDF graph is browsed as a weighted directed graph
RDFS-based Model Resource RDF graph RDF graph is browsed only with RDFS semantics
SW navigation model Resource
& SWD
Semantic Web Assume Swoogle is used in navigation
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Ranking with Rational Surfing Model: An Example
foaf:mbox
rdf:type
finin@umbc.edu
foaf:Person
http://www.cs.umbc.edu/~finin/foaf.rdfwordNet:Person
rdf:type rdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
TM
TM
TM
http://www.w3.org/2000/01/rdf-schema
rdfs:subClassOf
rdf:Property
rdf:type
http://xmlns.com/wordnet/1.6/
rdfs:Classrdf:type
wordNet:Individualrdfs:subClassOf
wordNet:Person
EX
Demo6 Swoogle’ top
10
This report is dynamically generated based on the latest data, and it will take 5 to 10 seconds.
Swoogle use PageRank like algorithm to rank semantic web documents. Well-known ontologies are highly ranked.
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Statistics – research Summarize the dataset collected by Swoogle
Swoogle Watch Swoogle Today Distribution of visited URLs Document discovery log Term discovery log
Semantic Web Watch SWD distribution by last-modified month SWD distribution by website SWD distribution by suffix
Ontology Watch Term (class/property) usage Namespace usage
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Demo5(a) Swoogle
Today
Demo5(b) Swoogle
Statistics
FOAFFOAF
TrustixTrustix
W3CW3C
StanfordStanford
Demo5(c) Swoogle
Statistics
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Miscellaneous Submit URL for focused Crawler Swoogle Web Service (Delivered in Sept.)
http://swoogle.umbc.edu/webservice/ Search document Search term Term digest
When you can’t find your ontologies in Swoogle, it may be the case that your ontologies are not indexed by swoogle yet.
Please submit it and increase its visibility.
From site map
When your query fails
Demo7 Submit URL for focused crawler
3. Summar
y Summary Current Status
SwoogleSwoogle
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Summary
Swoogle (Mar, 2004)Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)
Swoogle3Swoogle3
Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface
Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search
Better discovery & revisit strategies Better navigation models Semantic web dataset Index Instance data More metadata (ontology mapping) Better web service interfaces
2005
2004
eBiquity Lab, CSEE, UMBC @SwoogleSwoogle
Current Status Swoogle Watch reported (Jan 6, 2005)
46.7 M triples 336 K SWDs: 4k ontologies 153 K terms: 94K classes & 59K properties
Ongoing work Research
Self-adaptive SWD Discovery Efficient SWD digest and RDF Graph Abstract Semantic Web navigation model
Engineering Enhancing Web Service interface
top related