introduction to the semantic web and linked open data

47
Introduction to the Semantic Web and Linked Open Data

Upload: amanda-stanley

Post on 26-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to the Semantic Web and Linked Open Data

Dramatis Personae

Nick Gibbins(in spirit)

Christopher Gutteridge

• Overview of issues relating to the publication and use of linked data in HEIs

• The lessons that we’ve learned!• Pragmatism rather than perfection• General guidelines rather than detailed

specifications• Coining cool URIs• Publication alongside existing resources• Licensing

Goals

http://is.gd/dqiJc(The only URL you need to write down)

• Detailed tutorial on the finer points of:• RDF• RDFa• RDF Schema• OWL• SPARQL• …

(an hour and a half isn’t enough for this – and there are good tutorials available online)

Non-Goals

“If HP knew what HP knows, we’d be three times more profitable”

Lew PlattHewlett-Packard Chairman and CEO

Linked Data in a NutshellLinked Data in a Nutshell

http://www.flickr.com/photos/arielarielariel/322301228/http://www.flickr.com/photos/arielarielariel/322301228/

• Linked Data is about providing structured data on the Web

• Doesn’t necessarily require RDF (though it usually uses it)

•Underlying model of triples used to describe the relations between entities in linked data

• This is the basis of the RDF data model

• (subject, predicate, object)• e.g. “The Hobbit”, “created by”, “JRR Tolkien”

The triple

The Hobbit JRR Tolkiencreated by

subject predicate object

• Take a citation:• Tim Berners-Lee, James Hendler and Ora Lassila. The

Semantic Web. Scientific American, May 2001

•We can identify a number of distinct statements in this citation:• There is an article titled “The Semantic Web”• One of its authors is a person named “Tim Berners-Lee”

(etc)• It appeared in a publication titled “Scientific American”• It was published in May 2001

Example

• We can represent these statements graphically:

Example

Tim Berners-Lee

James Hendler

Ora Lassila

The Semantic Web

Scientific American

name

name

name

title

title

creator

publishedIn creator

creator

2001-05

date

Example

• There are two types of node in this graph:• Literals, which have a value but no identity

(a string, a number, a date)

• Resources, which represent objects with identity(a web page, a person, a journal)

Scientific American

• Resources are identified by URIs• Property labels are also identified by URIs, and are

drawn from a vocabulary or ontology

Example

http://purl.org/dc/elements/1.1/title

http://www.sciam.com/ Scientific American

subject predicate object

• The triple-based graph model makes it possible to mix terms from different vocabularies in the same graph

• Simplifies the task of information integration

Mixing Vocabularies

Tim Berners-Lee

James Hendler

Ora Lassila

The Semantic Web

Scientific American

name

name

name

title

title

creator

publishedIn creator

creator

2001-05

date

foaf

dc

bibo

Set of publishing practices for SW data:

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those

names3. When someone looks up a URI, provide useful

information4. Include links to other URIs. so that they can discover

more things

Effectively, putting the hypertext back into the Semantic Web

Simplifies integration between datasets while maintaining loose coupling

Linked Data Principles

Example

graph describing ‘sw’

sciam

tbl

jh

ora

sw

The Semantic Web title

creator

publishedIn creator

creator

2001-05

date

graph describing ‘tbl’

Tim Berners-Lee

nametbl

graph describing ‘jh’

James Hendlername

jh

graph describing ‘ora’

Ora Lassilanameoragraph describing ‘sciam’

Scientific American

titlesciam

• URIs must only identify one concept. Ever.• I am not my homepage.

Person Document

• URI represents a person.

• Requesting URI via web gets a “See Other” response.

• Requester redirected to most appropriate document URL. usually HTML or RDF+XML

Publishing Example

<<>><<><>><>>><>><>><>><>><><>>>><<><><<<<<><><><><><><><><><><><><><<<<>>><><<><><>><>

• DON’T worry about understanding the XML. It’s the equivalent of “view-source” in a webpage!

• Use a tool to covert it to something less icky! (http:/graphite.ecs.soton.ac.uk/browser/ for example)

Publishing RDF

• Worry about it later!

• Start with data you can make freely available

Access Control

• You want your data to be used & reused, right?• Don’t prevent commercial use.• Don’t prevent derivative works (prevents people

using it at all!)• If there are any things which your data should not

be used for why are you publishing it?

Licensing

• Must-Attribute license• Public Domain license

(your info still can’t be used in illegal ways, of course)

• Procrastinate and worry about it later(much better than not publishing your data)

Licensing Options

Breakout

• What datasets does your organisation already maintain?

• What is the business case for making them available?• in a machine readable form• to all members • without bureaucracy or restriction.

• What are the barriers to putting them online and maintaining them?

• What are the benefits to the wider community?

• What are the risks?

Task

• List your 3 easiest wins - the lowest hanging fruit.

• Starting suggestion: Every building & campus in your organisation with:• Number • Building Name• Site (Campus)• Lat & Long This data changes very slowly and also made freely available

already.

Task

ECS Demo

• http://id.ecs.soton.ac.uk/docs/

• http://rdf.ecs.soton.ac.uk/person/1248

• http://rdf.ecs.soton.ac.uk/project/42

Cool URIs

Beauty

• http://domain/classOfThing/scheme/identifier• http://domain/classOfThing/scheme/identifier.rdf• http://domain/classOfThing/scheme/identifier.html

• http://mysite.org/person/username/t23• http://mysite.org/person/username/t23.rdf• http://mysite.org/person/username/t23.html

Scheme is optional but futureproofs you against next time the university reorganises everything.

And The Beast

http://www.diy.com/diy/jsp/bq/nav.jsp?action=detail&fh_oneslice=true&fh_view_size=10&fh_reffacet=styleStyle&fh_location=%2f%2fcatalog01%2fen_GB%2fcategories%3C{9372014}%2fcategories%3C{9372039}%2fcategories%3C{9372150}%2fspecificationsProductType%3done_hole_taps%2fstyleStyle%3E{adelaide}&fh_refview=summary&fh_refpath=facet_159017215&fh_secondid=10507747&fh_eds=%C3%9F&ts=1279018688652

Further ReadingFurther Reading

http://www.flickr.com/photos/markhillary/337685031/http://www.flickr.com/photos/markhillary/337685031/

• http://www.w3.org/standards/semanticweb/ • http://www.w3.org/standards/techs/rdf • http://www.w3.org/standards/techs/owl• http://www.w3.org/TR/swbp-vocab-pub/

W3C Specifications

Tools

•Graphite Browser• http://graphite.ecs.soton.ac.uk/browser/

• Tabulator• http://www.w3.org/2005/ajar/tab

Linked Data Help

• Linked Data Website• http://linkeddata.org/

• The Patterns Book• http://patterns.dataincubator.org/book/

• Semantic Overflow• http://www.semanticoverflow.com/

• SKOS (Simple Knowledge Organisation Scheme)• Taxonomies and thesauri

• SIOC (Semantically Interlinked Online Communities)• Web forums, mailing lists, etc

• FOAF (Friend of a Friend)• People, social networks

• DC (Dublin Core)• Basic bibliographic information

• BIBO (Bibliographic Ontology)• Advanced bibliographic information

• GEO• Simple geolocation (lat/long) ontology

Common Namespaces

Cool URIs

• Cool URIs don't change (by TimBL)• http://www.w3.org/Provider/Style/URI

• Cool URIs for the Semantic Web• http://www.w3.org/TR/cooluris/

• ECS URI scheme documentation• http://id.ecs.soton.ac.uk/docs/

Infrastructure Namespaces

• RDF & RDFS• These describe classes & predicates which are used to tie

everything together. rdf:type is used to give a URI a class <http://id.ecs.soton.ac.uk/person/1248> rdf:type

foaf:Person .

•OWL• Used to describe the meaning of predicates & classes in

machine-readable form.• Start with a human readable documents, OWL is not widely

consumed (yet?)

• XSD• Describes datatypes like String, Positve Integer etc.

Take Home MessagesTake Home Messages

http://www.flickr.com/photos/71894657@N00/2696793132/http://www.flickr.com/photos/71894657@N00/2696793132/

• ‘Cool URIs don’t change’ – once you’ve chosen a URI convention for your organisation, it’s a pain to change it

• Getting this right is key to having your linked data used more widely

We think that we got this one mostly right……but we still had too many anonymous nodes around

Good URI Selection

• Go for an incremental approach• …but keep an eye on possible avenues for future

expansion

• RDFa is not for beginners!

• Don’t do as we did: we tried to build linked data for all of our internal data in one go

Start with the easy stuff

• Regardless of your application domain, there is probably already an ontology that does some of what you want

• …but don’t be afraid to invent relationships and classes if you can’t find any suitable

• Don’t do as we did! we wrote a new ontology from scratch, rather than reusing FOAF+DC)

Don’t reinvent the wheel

• Build linked data for your own consumption first• You know what your use cases are – better to support

these than to second guess those of unknown future users

• Don’t do as we did: we overcomplicated our data by trying to support all of the plausible scenarios that we could think of, rather than concentrating on what mattered to us

(be glad I couldn't find any clip art for this slide)

Eat your own dogfood

• You should aim to publish as RDF• Publishing as CSV may get your data out there

faster as an interim measure

We used CSV as a ‘glue’ data format between different systems, but chose not to expose data until we could do so as RDF.

Don’t underestimate CSV

Thanks

[email protected]

•@cgutteridge

• http://blogs.ecs.soton.ac.uk/webteam/

http://is.gd/dqiJc