the semantic web in use: analyzing foaf documents
DESCRIPTION
The Semantic Web in use: Analyzing FOAF Documents. Li Ding, Lina Zhou, Tim Finin and Anupam Joshi University of Maryland, Baltimore County. DARPA contract F30602-00-0591and NSF awards ITR-IIS-0326460 and ITR-IIS-0325464 provided partial research support for this work. Outline. Motivation - PowerPoint PPT PresentationTRANSCRIPT
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Semantic Web in The Semantic Web in use:use:
AnalyzingAnalyzingFOAF DocumentsFOAF Documents
Li Ding, Lina Zhou,Tim Finin and Anupam Joshi
University of Maryland, Baltimore County
DARPA contract F30602-00-0591and NSF awards ITR-IIS-0326460 and ITR-IIS-0325464 provided partial research support for this work
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Outline Motivation Introduction
The six popular ontologies FOAF vocabulary Why FOAF
Building FOAF Document collection FOAF Document Identification FOAF Document Discovery Popular Properties of foaf:Person
Applications Personal Information Fusion Social Network Analysis
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Semantic Web The semantic web vision is that information and
services are described using shared ontologies in KR-like markup languages, making them accessible to machines (programs).
How do we get there? What kind of ontologies? IEEE SUO? Cyc? What kind of languages? RDF? OWL? RuleML?
It’s reasonable to start with the simple and move toward the complex From Dublin Core to CYC From RDF to OWL and beyond
Significant semantic web content exists today Using simple vocabularies (e.g., FOAF) and RDF/RDFS
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Semantic Web The more important word in “Semantic Web” is
the latter The KR aspects of the SW were taken off the
shelf, the result of 25 years of research done in the AI community
Remember hypertext? It was a nice research backwater going back to the 50’s (recall Memex and Xanadu) Hypertext was forever change by the Web So maybe the web will forever change KR
TBL: “The Semantic Web will globalize KR, just as the WWW globalize hypertext”
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Web of what? What features does the web bring to the
table? “Anyone can say anything about anything” The meaning of RDF terms will be (partly)
determined socially It’s a web of documents, services, agents and
people
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
What kind of Ontologies?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restriction
Disjointness, Inverse,part of…
After Deborah L. McGuinness (Stanford)
Taxonomies
Expressive
Ontologies
Wordnet
CYCRDF DAML
OO
DB Schema RDFS
IEEE SUOOWL
UMLS
Vocabularies Simple
Ontologies
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Semantic Web Today There are several simple RDF vocabularies
that are widely used today Dublin Core RSS FOAF
It’s instructive to study how these are being used today
And to track how their usage changes
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Six Most Popular Ontologies
RDF
DC
RSS
FOAF
RDFS
MCVB
The statistics is generated by http://swoogle.umbc.edu
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
A usecase: FOAF FOAF (Friend of a Friend) is a simple ontology to describe
people and their social networks. See the foaf project page: http://www.foaf-project.org/
We recently crawled the web and discovered over 1,500,000 valid RDF FOAF files. Most of these are from seveal blogging system that encode
basic user info in foaf See http://apple.cs.umbc.edu/semdis/wob/foaf/
<foaf:Person><foaf:name>Tim Finin</foaf:name><foaf:mbox_sha1sum>2410…37262c252e</foaf:mbox_sha1sum><foaf:homepage rdf:resource="http://umbc.edu/~finin/" /><foaf:img rdf:resource="http://umbc.edu/~finin/images/passport.gif" />
</foaf:Person>
FOAF vocabularyhttp://xmlns.com/foaf/0.1/
@
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF: why RDF? Extensibility! FOAF vocabulary provides 50+ basic terms for
making simple claims about people FOAF files can use other RDF terms too: RSS,
MusicBrainz, Dublin Core, Wordnet, Creative Commons, blood types, starsigns, …
RDF guarantees freedom of independent extension OWL provides fancier data-merging facilities
Result: Freedom to say what you like, using any RDF markup you want, and have RDF crawlers merge your FOAF documents with other’s and know when you’re talking about the same entities.
After Dan Brickley, [email protected]
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
No free lunch!Consequence: We must plan for lies, mischief, mistakes, stale
data, slander Dataset is out of control, distributed, dynamic Importance of knowing who-said-what
Anyone can describe anyone We must record data provenance Modeling and reasoning about trust is critical
Legal, privacy and etiquette issues emerge Welcome to the real world
After Dan Brickley, [email protected]
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF example using XML<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-
rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"><foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:mbox
rdf:resource="mailto:[email protected]"/> </foaf:Person></rdf:RDF>
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF example using XML<foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:mbox
rdf:resource="mailto:[email protected]"/> <foaf:nick>Tim</foaf:nick> <foaf:homepage
rdf:resource="http://umbc.edu/~finin/"/> <foaf:img rdf:resource=
"http://umbc.edu/~finin/passport.gif"/> </foaf:Person>
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF example using XML
<foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:knows>
<foaf:Person>
<foaf:name>Anupam Joshi</foaf:name>
<rdf:seeAlso rdf:resource = "http://umbc.edu/~joshi/joshi.foaf"/>
<foaf:knows>
</foaf:Person>
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF isn’t the only one Other ontologies are used to publish social
information Swoogle finds >360 RDFs or OWL classes with the
local name “person.”
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Lots of FOAF tools
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Why FOAF Information Creators
Community membership management Unique Person Identification (privacy preserved) Indicating Authorship
Information Consumers Provenance tracking Social networking
Expose community information to new comers Match interests
Trust building block
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Studying how FOAF is being used
What counts as a FOAF document?
How can we find foaf documents?
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
1. D is an RDF document.2. D uses FOAF namespace3. The RDF graph serialized by D contains the sub-graph below
4. D defines one and only one Person instance
1. D is an RDF document.2. D uses FOAF namespace3. The RDF graph serialized by D contains the sub-graph below
4. D defines one and only one Person instance
Identify a FOAF document D is a generic FOAF document when 1,2,3 met D is a strict FOAF document when 1,2,3,4 met
X
foaf:Person
Z foaf:Y
rdf:type
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Different FOAF collections DS-Swoogle
Foaf documents selected from Swoogle’s database of ~340K semantic web documents
Swoogle selects at most 1000 documents from any site
DS-FOAF Custom crawler found 1.5M foaf documents, most
from a few large blog sites (e.g., livejournal) DS-FOAF-Small
Subset of ~7K non-blog foaf documents from ~1K sites defining ~37K people
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
FOAF document Discovery Bootstrap: using web search engine (Got 10,000 docs) Discovery: using rdfs:seeAlso semantics (Got 1.5M docs)
Top 7 FOAF websites
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
From DS-Swoogle 17 SWDs add to the definition of foaf:Person
e.g., defining superclasses, disjointness, etc. 162 properties are defined for foaf:Person
e.g., properties whose domain is foaf:Person 74 properties defined as relations between
people e.g., properties with both domain and range of
foaf:Person 582 properties used
e.g., used to assert something of a foaf:Person instance
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Popular properties of foaf:Person
non-blog(26,936)
liveJournal.com(20,298,073)
DS-FOAF-SMALL *(33,790)
1 foaf:mbox_sha1sum (0.84) foaf:mbox_sha1sum (1.0) foaf:name(0.80)
2 foaf:homepage (0.66 ) dc:description(1.0) foaf:mbox_sha1sum(0.71)
3 foaf:name (0.64) dc:title (1.0) foaf:nick (0.51)
4 foaf:nick (0.61) foaf:nick (1.0) foaf:homepage (0.40)
5 foaf:weblog (0.60) foaf:page (1.0) foaf:depiction (0.35)
6 foaf:knows (0.44) foaf:weblog (0.99) foaf:weblog (0.30)
7 foaf:mbox (0.38) rdfs:seeAlso (0.85) foaf:knows (0.28)
8 foaf:img (0.38) foaf:knows (0.85) foaf:surname (0.27)
9 bio:olb (0.35) foaf:dateOfBirth (0.71) foaf:firstName (0.26)
10 rdfs:seeAlso (0.34) foaf:interest (0.67) rdfs:seeAlso (0.26)
11 foaf:mbox (0.26)
*DS-FOAF-SMALL is a newly dataset in Oct 2004, based on 7276 evenly sampled documents.
Top 10 popular properties (per document)
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Popular properties of foaf:Person
non-blog(26,936)
liveJournal.com(20,298,073)
DS-FOAF-SMALL *(33,790)
1 foaf:name (0.84) dc:title (1.74) foaf:name(0.69)
2 foaf:knows (0.79) foaf:interest (1.68) foaf:mbox_sha1sum(0.65)
3 foaf:homepage (0.63) foaf:nick (1.04) rdfs:seeAlso (0.39)
4 foaf:mbox_sha1sum (0.51) foaf:weblog (1.00) foaf:nick (0.26)
5 rdfs:seeAlso (0.40) rdfs:seeAlso (0.99) foaf:homepage (0.18)
6 dc:title (0.31) foaf:knows (0.95) foaf:mbox (0.15)
7 foaf:nick (0.22) foaf:page (0.95) foaf:weblog (0.15)
8 foaf:weblog (0.18) dc:description (0.046) foaf:firstName (0.11)
9 foaf:mbox (0.15) foaf:mbox_sha1sum (0.046) foaf:surname (0.11)
10 daml:equivalentTo (0.13) foaf:dateOfBirth (0.046) foaf:depiction (0.10)
11 foaf:knows (0.07)
Top 10 popular properties (per instance)
*DS-FOAF-SMALL is a newly dataset in Oct 2004, based on 7276 evenly sampled documents.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Extracting social networksThree steps Discovering foaf instances Merging instances representing the same
person Linking people via foaf:knows and other foaf
based relations e.g., quaffing:drankBeerWith
Integrating other SNA data e.g., from co-author relationships mined from
citeseer
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Merging instances Named instances Inverse functional properties Set of nearly inverse functional properties OWL constraints Rdf:seeAlso
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Collecting Personal Information
http:www.cs.umbc.edu/~dingli1/foaf.rdf
http://www-2.cs.cmu.edu/People/fgandon/foaf.rdf
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Caution: Collision? Mistake!
http://www.mindswap.org/~katz/2002/11/jordan.foaf
http://www.ilrt.bris.ac.uk/people/cmdjb/webwho.xrdf
caution
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA1: Instances of foaf:Person/doc Zipf’s distribution Sloppy tail: few foaf documents contain
thousands of instances
1
10
100
1000
10000
1 10 100 1000 10000 100000
# of persons
# of
FO
AF
doc
umen
ts
Cumulative distribution
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA2: Instances of foaf:Person/group Zipf’s distribution Sloppy tail: some instances are wrongly
fused due to incorrect FOAF documents
1
10
100
1000
10000
100000
1 10 100 1000
group size (# of persons)
# of
gro
ups
Cumulative distribution
A group refers to a fused person
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Degree analysis For social networks, the in-degree and out-
degree measure of a person is of interest Can be used to identify hubs and authorities
or to compute other interesting properties or rankings
Analyzing most large social networks reveals that in-degree and out-degree follows a power law or Zipf distribution
We found that to be the case for social networks induced by foaf documents.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA3: In-degree of group Zipf’s Distribution Sharp tail: few FOAF documents have large in-
degrees
1
10
100
1000
10000
100000
1 10 100in degree of group
# o
f gro
ups
Cumulative distribution
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA4: Out-degree of group Zipf’s distribution Sloppy tail: few person directory documents
1
10
100
1000
10000
1 10 100 1000 10000 100000
out degree per group
# of
gro
ups
Cumulative distribution
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA5: Patterns of FOAF Network Four types of group
Isolated Only in
only one inlink (97%) Only out Both (intermediate)
Basic Patterns: Singleton: (isolated) Star: (only out) an active
person publishes friends Clique: a small group
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA6: Size of components Zipf’s distribution Sloppy head: singleton Sloppy tail: blog websites (e.g. www.livejournal.com)
1
10
100
1000
10000
1 10 100 1000 10000 100000
# of groups per connected component
# of connecte
d com
ponent
Cumulative distribution
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA7: Growth of FOAF networkThe data suggests that there is a natural evolution for a social network
(1) disjointed star-like, connected components
(2) link together to form trees and forests,
(3) eventually forming a scale-free network
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
SNA7: Growth of FOAF network1
2
3
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
The Map of FOAF network
www.livejournal.com
www.ecademy.com
Blog.livedoor.jp
non-blog
June 2004
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Conclusions The semantic web is evolving There is a growing volume of RDF content FOAF is one of the one of the early
successes. FOAF data is being used FOAF data is relatively easy to collect and
analize FOAF data is a good source for social
network information
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland
Questions?
Demo: http://apple.cs.umbc.edu/semdis
Swoogle: http://swoogle.umbc.edu/
ebiquity group: http://ebiquity.umbc.edu